[jira] [Commented] (SPARK-1239) Don't fetch all map output statuses at each reducer during shuffles

Mridul Muralidharan (JIRA) Sun, 08 Mar 2015 17:44:54 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352377#comment-14352377
 ]


Mridul Muralidharan commented on SPARK-1239:
--------------------------------------------

Hitting akka framesize for map outputtracker is very easy since we fetch whole 
output (m * r) - while I cant get into specifics of our jobs or share logs; but 
it is easy to see this hitting 1G for 100k mappers and 50k reducers.
If this is not being looked into currently, I can add it to my list of things 
to fix - but if there is already work being done, I dont want to duplicate it.

Even something trivial like what was done in task result would suffice (if we 
dont want the additional overhead of per per reduce map output generation at 
master).

> Don't fetch all map output statuses at each reducer during shuffles
> -------------------------------------------------------------------
>
>                 Key: SPARK-1239
>                 URL: https://issues.apache.org/jira/browse/SPARK-1239
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>    Affects Versions: 1.0.2, 1.1.0
>            Reporter: Patrick Wendell
>
> Instead we should modify the way we fetch map output statuses to take both a 
> mapper and a reducer - or we should just piggyback the statuses on each task. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1239) Don't fetch all map output statuses at each reducer during shuffles

Reply via email to