Jobtracker doesn't always spread reduce tasks evenly if
(mapred.tasktracker.tasks.maximum > 1)
----------------------------------------------------------------------------------------------
Key: HADOOP-465
URL: http://issues.apache.org/jira/browse/HADOOP-465
Project: Hadoop
Issue Type: Bug
Components: mapred
Reporter: Chris Schneider
Priority: Minor
I note that (at least for Nutch 0.8 Generator.Selector.reduce) if
mapred.reduce.tasks is the same as the number of tasktrackers, and
mapred.tasktracker.tasks.maximum is left at the default of 2, I typically have
no reduce tasks running on a few of my tasktrackers, and two reduce tasks
running on the same number of other tasktrackers.
It seems like the jobtracker should assign reduce tasks to tasktrackers in a
round robin fashion, so that the distribution will be spread as evenly as
possible. The current implementation would seem to waste at least some time if
one or more slave machines have to execute two reduce tasks simultaneously
while other tasktrackers sit idle, with the amount of wasted time depending on
how dependent the reduce tasks were on the slave machine's resources.
I first thought that perhaps the jobtracker was "overloading" the tasktrackers
that had already finished their map tasks (and avoiding those that were still
mapping). However, as I understand it, the reduce tasks are all launched at the
beginning of the job so that they are all ready and waiting for map output data
when it first appears.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira