Assign tasktrackers more than one task per hearbeat
---------------------------------------------------
Key: HADOOP-3751
URL: https://issues.apache.org/jira/browse/HADOOP-3751
Project: Hadoop Core
Issue Type: Improvement
Components: mapred
Reporter: Arun C Murthy
Currently each TaskTracker gets one and only one new task to run per heartbeat.
Also, a TaskTracker immediately rushes to the JobTracker when a task completes
without honouring the heartbeat interval (default of 5s).
The problem with this is multi-fold:
1. This is a utilization bottleneck, especially when the TaskTracker just
starts up. We should be assigning atleast 50% of it's capacity.
2. If the individual tasks are very short i.e. run for less than the heartbeat
interval the TaskTracker serially runs _one task at a time_.
3. For jobs with small maps, the TaskTracker never gets a chance to schedule
reduces till _all maps are complete_. This means shuffle doesn't overlap with
maps at all, another sore-point.
Overall, the right approach is to let the TaskTracker advertise the number of
available map and reduce slots in each heartbeat and the JobTracker (i.e the
Scheduler - HADOOP-3412/HADOOP-3445) should decide how many tasks and which
maps/reduces the TaskTracker should be assigned. Also, we should ensure that
the TaskTracker doesn't run to the JobTracker every-time a task completes -
maybe we should hard-limit to the heartbeat interval or maybe run to the
JobTracker when there are more than one completed tasks in a given heartbeat
interval etc.
Lets discuss.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.