Billy,
Billy wrote:
Reduce Jobs must wait for all maps to be done before doing any work. Why are
they started before the maps are done?
Reduces are started simultaneously with maps so that the 'shuffle' phase
i.e. copying of completed maps' outputs, can can done in parallel. This
is especially important since we have significantly more maps than the
no. of available map slots in the cluster and hence there are waves of
maps. This plays nicely since maps are, typically, cpu-bound and shuffle
is io-bound - keeping your cluster humming.
E.g. sort500 (5TB sort on 500 node hadoop cluster) runs with ~40,000
maps. Given that we configure max concurrent maps on single node as 2,
we can run only 1000 of them concurrently and hence the multiple waves
of maps.
Now that http://issues.apache.org/jira/browse/HADOOP-1274 has been fixed
(trunk i.e. coming in hadoop-0.16.0) you could configure different
values of max reduces and maps on a per-node basis if your jobs could
benefit from them.
example of problem
If I am running a job and its taking up all the reduce task for all nodes
and I launch a second job and see the job priority higher then the current
running it will start running the map jobs but I have to wait until the
first job completes to release the reduce jobs. So basically the priority
option does not gain anything from it. unless the number of reduce jobs per
job is less then nodes.
Something like Hadoop-on-Demand solves this for you, see
http://issues.apache.org/jira/browse/HADOOP-1301. It's coming soon...
Any way we can set an option or default on reduce tasks to wait until 90% or
more jobs are done/running before launching?
No, not at this point. Like you said, having smaller no. of reduces will
help, or HoD definitely will.
hth,
Arun
Billy