[
https://issues.apache.org/jira/browse/HADOOP-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arun C Murthy resolved HADOOP-3751.
-----------------------------------
Resolution: Duplicate
Duplicate of HADOOP-3136.
> Assign tasktrackers more than one task per hearbeat
> ---------------------------------------------------
>
> Key: HADOOP-3751
> URL: https://issues.apache.org/jira/browse/HADOOP-3751
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Arun C Murthy
>
> Currently each TaskTracker gets one and only one new task to run per
> heartbeat. Also, a TaskTracker immediately rushes to the JobTracker when a
> task completes without honouring the heartbeat interval (default of 5s).
> The problem with this is multi-fold:
> 1. This is a utilization bottleneck, especially when the TaskTracker just
> starts up. We should be assigning atleast 50% of it's capacity.
> 2. If the individual tasks are very short i.e. run for less than the
> heartbeat interval the TaskTracker serially runs _one task at a time_.
> 3. For jobs with small maps, the TaskTracker never gets a chance to schedule
> reduces till _all maps are complete_. This means shuffle doesn't overlap with
> maps at all, another sore-point.
> Overall, the right approach is to let the TaskTracker advertise the number of
> available map and reduce slots in each heartbeat and the JobTracker (i.e the
> Scheduler - HADOOP-3412/HADOOP-3445) should decide how many tasks and which
> maps/reduces the TaskTracker should be assigned. Also, we should ensure that
> the TaskTracker doesn't run to the JobTracker every-time a task completes -
> maybe we should hard-limit to the heartbeat interval or maybe run to the
> JobTracker when there are more than one completed tasks in a given heartbeat
> interval etc.
> Lets discuss.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.