Billy,

Billy wrote:
Reduce Jobs must wait for all maps to be done before doing any work. Why are they started before the maps are done?


Reduces are started simultaneously with maps so that the 'shuffle' phase i.e. copying of completed maps' outputs, can can done in parallel. This is especially important since we have significantly more maps than the no. of available map slots in the cluster and hence there are waves of maps. This plays nicely since maps are, typically, cpu-bound and shuffle is io-bound - keeping your cluster humming.

E.g. sort500 (5TB sort on 500 node hadoop cluster) runs with ~40,000 maps. Given that we configure max concurrent maps on single node as 2, we can run only 1000 of them concurrently and hence the multiple waves of maps.

Now that http://issues.apache.org/jira/browse/HADOOP-1274 has been fixed (trunk i.e. coming in hadoop-0.16.0) you could configure different values of max reduces and maps on a per-node basis if your jobs could benefit from them.

example of problem

If I am running a job and its taking up all the reduce task for all nodes and I launch a second job and see the job priority higher then the current running it will start running the map jobs but I have to wait until the first job completes to release the reduce jobs. So basically the priority option does not gain anything from it. unless the number of reduce jobs per job is less then nodes.


Something like Hadoop-on-Demand solves this for you, see http://issues.apache.org/jira/browse/HADOOP-1301. It's coming soon...

Any way we can set an option or default on reduce tasks to wait until 90% or more jobs are done/running before launching?


No, not at this point. Like you said, having smaller no. of reduces will help, or HoD definitely will.

hth,
Arun

Billy




Reply via email to