Thanks for your answer Imran. I haven't tried your suggestions yet, but
setting spark.shuffle.blockTransferService=nio solved my issue. There is a
JIRA for this: https://issues.apache.org/jira/browse/SPARK-6962.
Zsolt
2015-04-14 21:57 GMT+02:00 Imran Rashid iras...@cloudera.com:
is it possible
I use EMR 3.3.1 which comes with Java 7. Do you think that this may cause
the issue? Did you test it with Java 8?
Zsolt - what version of Java are you running?
On Mon, Mar 30, 2015 at 7:12 AM, Zsolt Tóth toth.zsolt@gmail.com
wrote:
Thanks for your answer!
I don't call .collect because I want to trigger the execution. I call it
because I need the rdd on the driver. This is not a huge RDD and it's not
Thanks for your answer!
I don't call .collect because I want to trigger the execution. I call it
because I need the rdd on the driver. This is not a huge RDD and it's not
larger than the one returned with 50GB input data.
The end of the stack trace:
The two IP's are the two worker nodes, I think
Don't call .collect if your data size huge, you can simply do a count() to
trigger the execution.
Can you paste your exception stack trace so that we'll know whats happening?
Thanks
Best Regards
On Fri, Mar 27, 2015 at 9:18 PM, Zsolt Tóth toth.zsolt@gmail.com
wrote:
Hi,
I have a simple
Hi,
I have a simple Spark application: it creates an input rdd with
sc.textfile, and it calls flatMapToPair, reduceByKey and map on it. The
output rdd is small, a few MB's. Then I call collect() on the output.
If the textfile is ~50GB, it finishes in a few minutes. However, if it's
larger