[ 
https://issues.apache.org/jira/browse/HADOOP-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672713#action_12672713
 ] 

Owen O'Malley commented on HADOOP-5223:
---------------------------------------

Roughly, I think the flow should look like:

EventFetcher -> HostPlanner -> FetcherPool -> OutputMerger

There is also a main shuffle object that tracks the progress of the shuffle. 
Each of these should be a separate class. The EventFetcher gets the map 
completion events from the TaskTracker. The HostPlanner will keep track of 
available map outputs, penalty box, and hands out hosts that are ready to the 
fetchers. The FetcherPool is pool of threads that are doing the actual copy of 
data. The OutputMerger manages the in memory and on disk data and has a thread 
to do merges.

We'll post a patch with the api soon.

> Refactor reduce shuffle code
> ----------------------------
>
>                 Key: HADOOP-5223
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5223
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.21.0
>
>
> The reduce shuffle code has become very complex and entangled. I think we 
> should move it out of ReduceTask and into a separate package 
> (org.apache.hadoop.mapred.task.reduce). Details to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to