[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-3278:
-----------------------------------

    Attachment: mr-3278.txt

Here's a candidate patch which fixed the CPU spinning on my cluster.

Worth noting that this problem is more severe when the fetcher thread count is 
configured higher than number of nodes. But, it still happens even if you have 
fewer fetchers than nodes, as soon as the number of unique nodes holding map 
output drops below the number of threads.
                
> 0.20: avoid a busy-loop in ReduceTask scheduling
> ------------------------------------------------
>
>                 Key: MAPREDUCE-3278
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3278
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1, performance, task
>    Affects Versions: 0.20.205.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: mr-3278.txt, reducer-cpu-usage.png
>
>
> Looking at profiling results, it became clear that the ReduceTask has the 
> following busy-loop which was causing it to suck up 100% of CPU in the fetch 
> phase in some configurations:
> - the number of reduce fetcher threads is configured to more than the number 
> of hosts
> - therefore "busyEnough()" never returns true
> - the "scheduling" portion of the code can't schedule any new fetches, since 
> all of the pending fetches in the mapLocations buffer correspond to hosts 
> that are already being fetched (the hosts are in the {{uniqueHosts}} map)
> - {{getCopyResult()}} immediately returns null, since there are no completed 
> maps.
> Hence ReduceTask spins back and forth between trying to schedule things (and 
> failing), and trying to grab completed results (of which there are none), 
> with no waits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to