[jira] Commented: (HADOOP-465) Jobtracker doesn't always spread reduce tasks evenly if (mapred.tasktracker.tasks.maximum > 1)

Owen O'Malley (JIRA) Mon, 21 Aug 2006 14:45:27 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-465?page=comments#action_12429540 ] 
            
Owen O'Malley commented on HADOOP-465:
--------------------------------------


There was a bug in the Hadoop scheduler that would schedule too many tasks on a 
node when the cluster was not full.  I fixed that in Hadoop-400, which has been 
committed after 0.5.0 was cut.

Another thing to keep in mind is that after the tasks are started, they can't 
be moved. So it is common to see the case where your last four (or 6 or 8) 
reduces are likely to be running on 2 (or 3 or 4) nodes (assuming 2 tasks/node).


> Jobtracker doesn't always spread reduce tasks evenly if 
> (mapred.tasktracker.tasks.maximum > 1)
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-465
>                 URL: http://issues.apache.org/jira/browse/HADOOP-465
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Chris Schneider
>            Priority: Minor
>
> I note that (at least for Nutch 0.8 Generator.Selector.reduce) if 
> mapred.reduce.tasks is the same as the number of tasktrackers, and 
> mapred.tasktracker.tasks.maximum is left at the default of 2, I typically 
> have no reduce tasks running on a few of my tasktrackers, and two reduce 
> tasks running on the same number of other tasktrackers.
> It seems like the jobtracker should assign reduce tasks to tasktrackers in a 
> round robin fashion, so that the distribution will be spread as evenly as 
> possible. The current implementation would seem to waste at least some time 
> if one or more slave machines have to execute two reduce tasks simultaneously 
> while other tasktrackers sit idle, with the amount of wasted time depending 
> on how dependent the reduce tasks were on the slave machine's resources.
> I first thought that perhaps the jobtracker was "overloading" the 
> tasktrackers that had already finished their map tasks (and avoiding those 
> that were still mapping). However, as I understand it, the reduce tasks are 
> all launched at the beginning of the job so that they are all ready and 
> waiting for map output data when it first appears.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-465) Jobtracker doesn't always spread reduce tasks evenly if (mapred.tasktracker.tasks.maximum > 1)

Reply via email to