optimize allocation of tasks w/ local data
------------------------------------------

         Key: HADOOP-173
         URL: http://issues.apache.org/jira/browse/HADOOP-173
     Project: Hadoop
        Type: Improvement

  Components: mapred  
    Versions: 0.2    
    Reporter: Doug Cutting
 Assigned to: Doug Cutting 


When a job first starts, all task trackers ask the job tracker for jobs at 
once.  With lots of task trackers, the job tracker gets very slow.  The first 
type of task that the job tracker attempts to find is one with some of its 
input data stored on the same node as the task tracker.  This case currently 
loops through tasks blindly, which, on average, requires 
numHosts/(replication*2) iterations to find a match (I think).  This could be 
optimized by adding a table mapping from host to task.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to