Improve the utilization of shuffle copier threads
-------------------------------------------------

                 Key: HADOOP-1719
                 URL: https://issues.apache.org/jira/browse/HADOOP-1719
             Project: Hadoop
          Issue Type: Improvement
          Components: mapred
            Reporter: Devaraj Das
            Assignee: Devaraj Das
             Fix For: 0.15.0


In the current design, the scheduling of copies is done and the scheduler (the 
main loop in fetchOutputs) won't schedule anything until it hears back from at 
least one of the copier threads. Due to this, the main loop won't query the 
TaskTracker asking for new map locations and may not be using all the copiers 
effectively. This may not be an issue for small-sized map outputs, where at 
steady state, the frequency of such notifications is frequent.
Ideally, we should schedule all what we can, and, depending on how busy we 
currently are, query the tasktracker for more map locations.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to