[ https://issues.apache.org/jira/browse/SPARK-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204939#comment-15204939 ]
Mridul Muralidharan commented on SPARK-1239: -------------------------------------------- [~tgraves] For the last part (waiting bit) - why not make the threshold where you use Broadcast instead of direct serialization such that the problem 'goes away' ? For my case, I was using a fairly high number, but nothing stopping us from using say 1mb - which means number of outstanding requests which will cause memory issue becomes extremely high to the point of being not possible practically. In general, I dont like the point about waiting for IO to complete - different nodes might have different loads, which can cause driver not to respond to fast nodes because slow nodes cause the response not to be sent (over time). > Don't fetch all map output statuses at each reducer during shuffles > ------------------------------------------------------------------- > > Key: SPARK-1239 > URL: https://issues.apache.org/jira/browse/SPARK-1239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core > Affects Versions: 1.0.2, 1.1.0 > Reporter: Patrick Wendell > Assignee: Thomas Graves > > Instead we should modify the way we fetch map output statuses to take both a > mapper and a reducer - or we should just piggyback the statuses on each task. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org