Hi,
I'm trying to replace values in a nested column in a JSON-based dataframe
using withColumn().
This syntax works for select, filter, etc, giving only the nested "country"
column:
df.select('body.payload.country')
but if I do this, it will create a new column with the name
Any comment on this one?
2016. nov. 16. du. 12:59 ezt írta ("Zsolt Tóth" <toth.zsolt@gmail.com>):
> Hi,
>
> I need to run a map() and a mapPartitions() on my input DF. As a
> side-effect of the map(), a partition-local variable should be updated,
> th
Hi,
I need to run a map() and a mapPartitions() on my input DF. As a
side-effect of the map(), a partition-local variable should be updated,
that is used in the mapPartitions() afterwards.
I can't use Broadcast variable, because it's shared between partitions on
the same executor.
Where can I
based on the renew-interval instead of the max-lifetime?
2016-11-04 2:37 GMT+01:00 Marcelo Vanzin <van...@cloudera.com>:
> On Thu, Nov 3, 2016 at 3:47 PM, Zsolt Tóth <toth.zsolt@gmail.com>
> wrote:
> > What is the purpose of the delegation token renewal (the one that i
extend its lifetime. The feature you're talking about is for
> creating *new* delegation tokens after the old ones expire and cannot
> be renewed anymore (i.e. the max-lifetime configuration).
>
> On Thu, Nov 3, 2016 at 2:02 PM, Zsolt Tóth <toth.zso
definitely exists and people definitely have run into it. So
> if you're not hitting it, it's most definitely an issue with your test
> configuration.
>
> On Thu, Nov 3, 2016 at 7:22 AM, Zsolt Tóth <toth.zsolt@gmail.com>
> wrote:
> > Hi,
> >
> > I ran some t
Any ideas about this one? Am I missing something here?
2016-11-03 15:22 GMT+01:00 Zsolt Tóth <toth.zsolt@gmail.com>:
> Hi,
>
> I ran some tests regarding Spark's Delegation Token renewal mechanism. As
> I see, the concept here is simple: if I give my keytab file and
Hi,
I ran some tests regarding Spark's Delegation Token renewal mechanism. As I
see, the concept here is simple: if I give my keytab file and client
principal to Spark, it starts a token renewal thread, and renews the
namenode delegation tokens after some time. This works fine.
Then I tried to
)
at
org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:304)
Regards,
Zsolt
2016-02-12 13:11 GMT+01:00 Ted Yu <yuzhih...@gmail.com>:
> Can you pastebin the full error with all column types ?
>
> There should be a difference between some column(s).
>
> Cheers
>
> > On Feb 11, 2016, at 2
;
> outputDF = unlabelledDF.join(predictedDF.select(“id”,”predicted”),”id”)
>
> On 11 February 2016 at 10:12, Zsolt Tóth <toth.zsolt@gmail.com> wrote:
>
>> Hi,
>>
>> I'd like to append a column of a dataframe to another DF (using Spark
>> 1.5.2):
>>
>
Hi,
I'd like to append a column of a dataframe to another DF (using Spark
1.5.2):
DataFrame outputDF = unlabelledDF.withColumn("predicted_label",
predictedDF.col("predicted"));
I get the following exception:
java.lang.IllegalArgumentException: requirement failed: DataFrame must have
the same
Hi,
I have a Spark job with many transformations (sequence of maps and
mapPartitions) and only one action in the end (DataFrame.write()). The
transformations return an RDD, so I need to create a DataFrame.
To be able to use sqlContext.createDataFrame() I need to know the schema of
the Row but for
Hi,
this is exactly the same as my issue, seems to be a bug in 1.5.x.
(see my thread for details)
2015-11-19 11:20 GMT+01:00 Jeff Zhang :
> Seems your jdbc url is not correct. Should be jdbc:mysql://
> 192.168.41.229:3306
>
> On Thu, Nov 19, 2015 at 6:03 PM,
Hi,
I try to throw an exception of my own exception class (MyException extends
SparkException) on one of the executors. This works fine on Spark 1.3.x,
1.4.x but throws a deserialization/ClassNotFound exception on Spark 1.5.x.
This happens only when I throw it on an executor, on the driver it
Hi Tamás,
the exception class is in the application jar, I'm using the spark-submit
script.
2015-11-19 11:54 GMT+01:00 Tamas Szuromi <tamas.szur...@odigeo.com>:
> Hi Zsolt,
>
> How you load the jar and how you prepend it to the classpath?
>
> Tamas
>
>
>
>
>
Hi,
I ran your example on Spark-1.4.1 and 1.5.0-rc3. It succeeds on 1.4.1 but
throws the OOM on 1.5.0. Do any of you know which PR introduced this
issue?
Zsolt
2015-09-07 16:33 GMT+02:00 Zoltán Zvara :
> Hey, I'd try to debug, profile ResolvedDataSource. As far as I
Hi all,
it looks like the 1.2.2 pre-built version for hadoop2.4 is not available on
the mirror sites. Am I missing something?
Regards,
Zsolt
.
On Wed, Apr 8, 2015 at 3:45 AM, Zsolt Tóth toth.zsolt@gmail.com
wrote:
I use EMR 3.3.1 which comes with Java 7. Do you think that this may cause
the issue? Did you test it with Java 8?
I use EMR 3.3.1 which comes with Java 7. Do you think that this may cause
the issue? Did you test it with Java 8?
via the SQL data source API:
https://github.com/apache/spark/pull/3753. You can try pulling that PR
and help test it. -Xiangrui
On Wed, Mar 25, 2015 at 5:03 AM, Zsolt Tóth toth.zsolt@gmail.com
wrote:
Hi,
I use sc.hadoopFile(directory, OrcInputFormat.class, NullWritable.class
huge, you can simply do a count() to
trigger the execution.
Can you paste your exception stack trace so that we'll know whats
happening?
Thanks
Best Regards
On Fri, Mar 27, 2015 at 9:18 PM, Zsolt Tóth toth.zsolt@gmail.com
wrote:
Hi,
I have a simple Spark application: it creates
Hi,
I have a simple Spark application: it creates an input rdd with
sc.textfile, and it calls flatMapToPair, reduceByKey and map on it. The
output rdd is small, a few MB's. Then I call collect() on the output.
If the textfile is ~50GB, it finishes in a few minutes. However, if it's
larger
Hi,
I use sc.hadoopFile(directory, OrcInputFormat.class, NullWritable.class,
OrcStruct.class) to use data in ORC format as an RDD. I made some
benchmarking on ORC input vs Text input for MLlib and I ran into a few
issues with ORC.
Setup: yarn-cluster mode, 11 executors, 4 cores, 9g executor
Hi,
I submit spark jobs in yarn-cluster mode remotely from java code by calling
Client.submitApplication(). For some reason I want to use 1.3.0 jars on the
client side (e.g spark-yarn_2.10-1.3.0.jar) but I have
spark-assembly-1.2.1* on the cluster.
The problem is that the ApplicationMaster can't
One more question: Is there reason why Spark throws an error when
requesting too much memory instead of capping it to the maximum value (as
YARN would do by default)?
Thanks!
2015-02-10 17:32 GMT+01:00 Zsolt Tóth toth.zsolt@gmail.com:
Hi,
I'm using Spark in yarn-cluster mode and submit
Hi,
I'm using Spark in yarn-cluster mode and submit the jobs programmatically
from the client in Java. I ran into a few issues when tried to set the
resource allocation properties.
1. It looks like setting spark.executor.memory, spark.executor.cores and
spark.executor.instances have no effect
Hi,
I use DecisionTree for multi class classification.
I can get the probability of the predicted label for every node in the
decision tree from node.predict().prob(). Is it possible to retrieve or
count the probability of every possible label class in the node?
To be more clear:
Say in Node A
27 matches
Mail list logo