Job Tracker should not clobber the data locality of tasks
---------------------------------------------------------
Key: HADOOP-2014
URL: https://issues.apache.org/jira/browse/HADOOP-2014
Project: Hadoop
Issue Type: Bug
Reporter: Runping Qi
Currently, when the Job Tracker assigns a mapper task to a task tracker and
there is no local split to the task tracker, the
job tracker will find the first runable task in the mast task list and assign
the task to the task tracker.
The split for the task is not local to the task tracker, of course. However,
the split may be local to other task trackers.
Assigning the that task, to that task tracker may decrease the potential number
of mapper attempts with data locality.
The desired behavior in this situation is to choose a task whose split is not
local to any task tracker.
Resort to the current behavior only if no such task is found.
In general, it will be useful to know the number of task trackers to which each
split is local.
To assign a task to a task tracker, the job tracker should first try to pick a
task that is local to the task tracker and that has minimal number of task
trackers to which it is local. If no task is local to the task tracker, the job
tracker should try to pick a task that has minimal number of task trackers to
which it is local.
It is worthwhile to instrument the job tracker code to report the number of
splits that are local to some task trackers.
That should be the maximum number of tasks with data locality. By comparing
that number with the the actual number of
data local mappers launched, we can know the effectiveness of the job tracker
scheduling.
When we introduce rack locality, we should apply the same principle.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.