[jira] Updated: (HADOOP-1719) Improve the utilization of shuffle copier threads

Devaraj Das (JIRA) Wed, 15 Aug 2007 23:13:54 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Devaraj Das updated HADOOP-1719:
--------------------------------

    Attachment: 1719.patch

This patch addresses the issue. It also adds another check to the 'busy' logic. 
The check being whether #uniqueHosts < #copiers, and if so conclude we are not 
busy enough. The logic is that we fetch only one map output at a time from any 
given host and uniqueHosts keep a track of that. As soon as we add a host to 
uniqueHosts, a 'copy' from that is scheduled as well. Thus, when the size of 
uniqueHosts is >= numCopiers, it means that all copiers are busy. Although the 
converse is not true (e.g. in the case where we have more copiers than the 
number of hosts in the cluster), but it should generally be useful to do this 
check.

> Improve the utilization of shuffle copier threads
> -------------------------------------------------
>
>                 Key: HADOOP-1719
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1719
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>             Fix For: 0.15.0
>
>         Attachments: 1719.patch
>
>
> In the current design, the scheduling of copies is done and the scheduler 
> (the main loop in fetchOutputs) won't schedule anything until it hears back 
> from at least one of the copier threads. Due to this, the main loop won't 
> query the TaskTracker asking for new map locations and may not be using all 
> the copiers effectively. This may not be an issue for small-sized map 
> outputs, where at steady state, the frequency of such notifications is 
> frequent.
> Ideally, we should schedule all what we can, and, depending on how busy we 
> currently are, query the tasktracker for more map locations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1719) Improve the utilization of shuffle copier threads

Reply via email to