optimize allocation of tasks w/ local data
------------------------------------------
Key: HADOOP-173
URL: http://issues.apache.org/jira/browse/HADOOP-173
Project: Hadoop
Type: Improvement
Components: mapred
Versions: 0.2
Reporter: Doug Cutting
Assigned to: Doug Cutting
When a job first starts, all task trackers ask the job tracker for jobs at
once. With lots of task trackers, the job tracker gets very slow. The first
type of task that the job tracker attempts to find is one with some of its
input data stored on the same node as the task tracker. This case currently
loops through tasks blindly, which, on average, requires
numHosts/(replication*2) iterations to find a match (I think). This could be
optimized by adding a table mapping from host to task.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira