[
https://issues.apache.org/jira/browse/SPARK-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Wendell updated SPARK-2156:
-----------------------------------
Priority: Blocker (was: Critical)
> When the size of serialized results for one partition is slightly smaller
> than 10MB (the default akka.frameSize), the execution blocks
> --------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-2156
> URL: https://issues.apache.org/jira/browse/SPARK-2156
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 0.9.1, 1.0.0
> Environment: AWS EC2 1 master 2 slaves with the instance type of
> r3.2xlarge
> Reporter: Chen Jin
> Assignee: Xiangrui Meng
> Priority: Blocker
> Original Estimate: 504h
> Remaining Estimate: 504h
>
> I have done some experiments when the frameSize is around 10MB .
> 1) spark.akka.frameSize = 10
> If one of the partition size is very close to 10MB, say 9.97MB, the execution
> blocks without any exception or warning. Worker finished the task to send the
> serialized result, and then throw exception saying hadoop IPC client
> connection stops (changing the logging to debug level). However, the master
> never receives the results and the program just hangs.
> But if sizes for all the partitions less than some number btw 9.96MB amd
> 9.97MB, the program works fine.
> 2) spark.akka.frameSize = 9
> when the partition size is just a little bit smaller than 9MB, it fails as
> well.
> This bug behavior is not exactly what spark-1112 is about.
--
This message was sent by Atlassian JIRA
(v6.2#6252)