Spawning tasks faster
---------------------

                 Key: HADOOP-3738
                 URL: https://issues.apache.org/jira/browse/HADOOP-3738
             Project: Hadoop Core
          Issue Type: Improvement
          Components: mapred
            Reporter: Spyros Blanas
            Priority: Minor
         Attachments: dynamic_heartbeat.patch

In the current implementation, tasks are assigned to tasktrackers by adding an 
appropriate action to the heartbeat response list. Each heartbeat response can 
start one task. As the minimum interval between heartbeats is 5 sec (by 
default), if the nodes are strong machines (say, each node has 10 task "slots") 
and the cluster is idle, this means that some tasks are spawned after some time 
(in our example, the last task will be spawned after 45 seconds).

This can be significantly improve the end-to-end execution time if most jobs 
are finished in the order of minutes.

The patch I attach requests from each TaskTracker to reply in 1/5th of the 
regular heartbeat interval time if it was assigned a task in this round, making 
spawning of multiple tasks much more efficient.

A better approach would be to have each TaskTracker report the number of free 
slots it has (instead of only if it can accept more work or not) and have the 
JobTracker push the appropriate number of tasks in one response, but this will 
require changes in the current communication protocol.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to