[ https://issues.apache.org/jira/browse/HADOOP-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Runping Qi reassigned HADOOP-2014: ---------------------------------- Assignee: Devaraj Das > Job Tracker should not clobber the data locality of tasks > --------------------------------------------------------- > > Key: HADOOP-2014 > URL: https://issues.apache.org/jira/browse/HADOOP-2014 > Project: Hadoop > Issue Type: Bug > Components: mapred > Reporter: Runping Qi > Assignee: Devaraj Das > > Currently, when the Job Tracker assigns a mapper task to a task tracker and > there is no local split to the task tracker, the > job tracker will find the first runable task in the mast task list and > assign the task to the task tracker. > The split for the task is not local to the task tracker, of course. However, > the split may be local to other task trackers. > Assigning the that task, to that task tracker may decrease the potential > number of mapper attempts with data locality. > The desired behavior in this situation is to choose a task whose split is not > local to any task tracker. > Resort to the current behavior only if no such task is found. > In general, it will be useful to know the number of task trackers to which > each split is local. > To assign a task to a task tracker, the job tracker should first try to pick > a task that is local to the task tracker and that has minimal number of task > trackers to which it is local. If no task is local to the task tracker, the > job tracker should try to pick a task that has minimal number of task > trackers to which it is local. > It is worthwhile to instrument the job tracker code to report the number of > splits that are local to some task trackers. > That should be the maximum number of tasks with data locality. By comparing > that number with the the actual number of > data local mappers launched, we can know the effectiveness of the job tracker > scheduling. > When we introduce rack locality, we should apply the same principle. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.