[ http://issues.apache.org/jira/browse/HADOOP-173?page=all ]

Doug Cutting updated HADOOP-173:
--------------------------------

    Attachment: fast-local-task.patch

This patch optimizes the jobtracker's allocation of tasks to nodes that have 
local data.  I have tested it, but not yet on a large cluster.

> optimize allocation of tasks w/ local data
> ------------------------------------------
>
>          Key: HADOOP-173
>          URL: http://issues.apache.org/jira/browse/HADOOP-173
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Doug Cutting
>     Assignee: Doug Cutting
>  Attachments: fast-local-task.patch
>
> When a job first starts, all task trackers ask the job tracker for jobs at 
> once.  With lots of task trackers, the job tracker gets very slow.  The first 
> type of task that the job tracker attempts to find is one with some of its 
> input data stored on the same node as the task tracker.  This case currently 
> loops through tasks blindly, which, on average, requires 
> numHosts/(replication*2) iterations to find a match (I think).  This could be 
> optimized by adding a table mapping from host to task.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to