[
https://issues.apache.org/jira/browse/HADOOP-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672713#action_12672713
]
Owen O'Malley commented on HADOOP-5223:
---------------------------------------
Roughly, I think the flow should look like:
EventFetcher -> HostPlanner -> FetcherPool -> OutputMerger
There is also a main shuffle object that tracks the progress of the shuffle.
Each of these should be a separate class. The EventFetcher gets the map
completion events from the TaskTracker. The HostPlanner will keep track of
available map outputs, penalty box, and hands out hosts that are ready to the
fetchers. The FetcherPool is pool of threads that are doing the actual copy of
data. The OutputMerger manages the in memory and on disk data and has a thread
to do merges.
We'll post a patch with the api soon.
> Refactor reduce shuffle code
> ----------------------------
>
> Key: HADOOP-5223
> URL: https://issues.apache.org/jira/browse/HADOOP-5223
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Fix For: 0.21.0
>
>
> The reduce shuffle code has become very complex and entangled. I think we
> should move it out of ReduceTask and into a separate package
> (org.apache.hadoop.mapred.task.reduce). Details to follow.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.