Naveen,
Don't be worried - you're not the only one to be bitten by this. A little
inspection of the Javadoc told me you have this other option:
JavaRDDInteger distData = sc.parallelize(data, 100);
-- Now the RDD is split into 100 partitions.
--
View this message in context:
Hi, my programming model requires me to generate multiple RDDs for various
datasets across a single run and then run an action on it - E.g.
MyFunc myFunc = ... //It implements VoidFunction
//set some extra variables - all serializable
...
for (JavaRDDString rdd: rddList) {
...
Excuse me - the line inside the loop should read: rdd.foreach(myFunc) - not
sc.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Running-an-action-inside-a-loop-across-multiple-RDDs-java-io-NotSerializableException-tp16580p16581.html
Sent from the Apache
Sorry - I'll furnish some details below. However, union is not an option for
the business logic I have. The function will generate a specific file based
on a variable passed in as the setter for the function. This variable
changes with each RDD. I annotated the log line where the first run
I'm stumped with this one. I'm using YARN on EMR to distribute my spark job.
While it seems initially, the job is starting up fine - the Spark Executor
nodes are having trouble pulling the jars from the location on hdfs that the
master just put the files on.
[hadoop@ip-172-16-2-167 ~]$
Not sure I can help, but I ran into the same problem. Basically my use case
is a that I have a List of strings - which I then convert into a RDD using
sc.parallelize(). This RDD is then operated on by the foreach() function.
Same as you, I get a runtime exception :
java.lang.ClassCastException:
Facing a funny issue with the Spark class loader. Testing out a basic
functionality on a vagrant VM with spark running - looks like it's
attempting to ship the jar to a remote instance (in this case local) and
somehow is encountering the jar twice?
14/07/11 23:27:59 INFO DAGScheduler: Got job 0
Hi,
I'm trying to get rid of an error (NoSuchMethodError) while using Amazon's
s3 client on Spark. I'm using the Spark Submit script to run my code.
Reading about my options and other threads, it seemed the most logical way
would be make sure my jar is loaded first. Spark submit on debug shows
I'm new to Spark and not very experienced with scala issues. I'm facing this
error message while trying to start up Spark on Mesos on a vagrant box.
vagrant@mesos:~/installs/spark-1.0.0$ java -cp
rickshaw-spark-0.0.1-SNAPSHOT.jar
com.evocalize.rickshaw.spark.applications.GenerateSEOContent -m