[jira] Commented: (MAPREDUCE-318) Refactor reduce shuffle code

Scott Carey (JIRA) Tue, 22 Sep 2009 12:06:41 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758371#action_12758371
 ]


Scott Carey commented on MAPREDUCE-318:
---------------------------------------

In addition to a quick code review of the bits I was interested in related to 
fetching map output fragments, I did a quick and dirty test on trunk on a tiny 
cluster  to make sure that this change had the same effect as the one-line fix 
I apply to 0.19.2 on production for similar benefits.  See my comment from June 
10 2009.  The old code was artificially throttling the shuffle to one output 
file per TT per ping-cycle.

Quite simply, any fix that lets a reducer fetch all the complete map outputs it 
finds in one ping-cycle helps those jobs with map output counts much greater 
than node count.  One line hack or full refactor.  

The impact really depends on the cluster config and job type... ours is new 
hardware with plenty of RAM per node which leads to using ~11 + concurrent map 
tasks per node and a larger ratio of map shards per reduce to task trackers.  
The bigger that ratio, the bigger the impact of optimized shuffle fetching.

> Refactor reduce shuffle code
> ----------------------------
>
>                 Key: MAPREDUCE-318
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-318
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5233_api.patch, HADOOP-5233_part0.patch, 
> mapred-318-14Aug.patch, mapred-318-20Aug.patch, mapred-318-24Aug.patch, 
> mapred-318-3Sep-v1.patch, mapred-318-3Sep.patch, mapred-318-common.patch
>
>
> The reduce shuffle code has become very complex and entangled. I think we 
> should move it out of ReduceTask and into a separate package 
> (org.apache.hadoop.mapred.task.reduce). Details to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-318) Refactor reduce shuffle code

Reply via email to