[ https://issues.apache.org/jira/browse/MAPREDUCE-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon updated MAPREDUCE-3278: ----------------------------------- Attachment: reducer-cpu-usage.png Here's a before-after of a node running terasort. On the left terasort (unpatched) you can see when the reducers start and eat up a ton of CPU. On the right (patched) terasort, the reducers add more iowait but CPU usage is minimal. top showed the reducers in fetch stage using ~15% CPU instead of ~105% CPU. Total terasort time improved by 10% or so. I'll upload a patch after a bit more testing. > 0.20: avoid a busy-loop in ReduceTask scheduling > ------------------------------------------------ > > Key: MAPREDUCE-3278 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3278 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, performance, task > Affects Versions: 0.20.205.0 > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Attachments: reducer-cpu-usage.png > > > Looking at profiling results, it became clear that the ReduceTask has the > following busy-loop which was causing it to suck up 100% of CPU in the fetch > phase in some configurations: > - the number of reduce fetcher threads is configured to more than the number > of hosts > - therefore "busyEnough()" never returns true > - the "scheduling" portion of the code can't schedule any new fetches, since > all of the pending fetches in the mapLocations buffer correspond to hosts > that are already being fetched (the hosts are in the {{uniqueHosts}} map) > - {{getCopyResult()}} immediately returns null, since there are no completed > maps. > Hence ReduceTask spins back and forth between trying to schedule things (and > failing), and trying to grab completed results (of which there are none), > with no waits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira