Chen Jin created SPARK-2156:
-------------------------------
Summary: When serialized results for any partition is slightly
smaller than 10MB (the default akka.frameSize), the execution blocks
Key: SPARK-2156
URL: https://issues.apache.org/jira/browse/SPARK-2156
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 1.0.0
Environment: AWS EC2 1 master 2 slaves with the instance type of
r3.2xlarge
Reporter: Chen Jin
Priority: Critical
Fix For: 1.0.1
I have done some experiments when the frameSize is around 10MB .
1) spark.akka.frameSize = 10
If one of the partition size is very close to 10MB, say 9.97MB, the execution
blocks without any exception or warning. Worker finished the task to send the
serialized result, and then throw exception saying hadoop IPC client connection
stops (changing the logging to debug level). However, the master never receives
the results and the program just hangs.
But if sizes for all the partitions less than some number btw 9.96MB amd
9.97MB, the program works fine.
2) spark.akka.frameSize = 9
when the partition size is just a little bit smaller than 9MB, it fails as well.
This bug behavior is not exactly what spark-1112 is about, could you please
guide me how to open a separate bug when the serialization size is very close
to 10MB.
--
This message was sent by Atlassian JIRA
(v6.2#6252)