[jira] Commented: (HADOOP-2014) Job Tracker should prefer input-splits from overloaded racks

Arun C Murthy (JIRA) Sun, 10 Feb 2008 09:39:31 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567453#action_12567453
 ]


Arun C Murthy commented on HADOOP-2014:
---------------------------------------

bq.     1.2 Scan all the tasks to find out a task that has the lowest number of 
data-local trackers (and also some load/rack/io/map-slots considerations).

bq.     1.2 Scan all the tasks with lowest number of data-local trackers (and 
also some load/rack/io/map-slots considerations).

Uh, both are _very_ expensive to do on every heartbeat (i.e. the inner loop) 
isn't it?!

The reasoning behind Owen's proposal considering the 'ratio' of 
{{runnableSplits / mapSlots}} is to get around the case where there are very 
few task-trackers in a rack. E.g. Lets say there are 200 splits on a rack1 with 
10 task-trackers (4 slots each) on it, and 100 splits on rack2 with 2 
task-trackers... then the ratio penalizes rack2 rather than rack1, which is the 
right call to make.

> Job Tracker should prefer input-splits from overloaded racks
> ------------------------------------------------------------
>
>                 Key: HADOOP-2014
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2014
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Devaraj Das
>
> Currently, when the Job Tracker assigns a mapper task to a task tracker and 
> there is no local split to the task tracker, the
> job tracker will find the first runable task in the mast task list  and 
> assign the task to the task tracker.
> The split for the task is not local to the task tracker, of course. However, 
> the split may be local to other task trackers.
> Assigning the that task, to that task tracker may decrease the potential 
> number of mapper attempts with data locality.
> The desired behavior in this situation is to choose a task whose split is not 
> local to any  task tracker. 
> Resort to the current behavior only if no such task is found.
> In general, it will be useful to know the number of task trackers to which 
> each split is local.
> To assign a task to a task tracker, the job tracker should first  try to pick 
> a task that is local to the task tracker  and that has minimal number of task 
> trackers to which it is local. If no task is local to the task tracker, the 
> job tracker should  try to pick a task that has minimal number of task 
> trackers to which it is local. 
> It is worthwhile to instrument the job tracker code to report the number of 
> splits that are local to some task trackers.
> That should be the maximum number of tasks with data locality. By comparing 
> that number with the the actual number of 
> data local mappers launched, we can know the effectiveness of the job tracker 
> scheduling.
> When we introduce rack locality, we should apply the same principle.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2014) Job Tracker should prefer input-splits from overloaded racks

Reply via email to