[ https://issues.apache.org/jira/browse/MAPREDUCE-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136769#comment-13136769 ]
Todd Lipcon commented on MAPREDUCE-3278: ---------------------------------------- AFAIK this only applies to the 0.20 code. The Shuffle was substantially rewritten for 0.21 by MAPREDUCE-318, which also did a big refactor. This JIRA is for a more targeted bug fix on the stable branch. > 0.20: avoid a busy-loop in ReduceTask scheduling > ------------------------------------------------ > > Key: MAPREDUCE-3278 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3278 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, performance, task > Affects Versions: 0.20.205.0 > Reporter: Todd Lipcon > Assignee: Todd Lipcon > > Looking at profiling results, it became clear that the ReduceTask has the > following busy-loop which was causing it to suck up 100% of CPU in the fetch > phase in some configurations: > - the number of reduce fetcher threads is configured to more than the number > of hosts > - therefore "busyEnough()" never returns true > - the "scheduling" portion of the code can't schedule any new fetches, since > all of the pending fetches in the mapLocations buffer correspond to hosts > that are already being fetched (the hosts are in the {{uniqueHosts}} map) > - {{getCopyResult()}} immediately returns null, since there are no completed > maps. > Hence ReduceTask spins back and forth between trying to schedule things (and > failing), and trying to grab completed results (of which there are none), > with no waits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira