Hi Hadoopers, Many of the computations that I am performing with MapReduce require several chains of MapReduce operations where the output of one or more previous reduce steps is the input to a future mapper. Since JobConf object doesn't seem to let you specify a chain of jobs (or does it? I may just be misunderstanding the API), I've just been waiting for JobConf.runJob for one step to return (which only happens when 100% of the reducers are finished) and then executing the next job. But, I'm wondering if there is any way to make the system take advantage of the time that is currently wasted at the end of the first job's reduce operation when most of the reducers have completed but before 100% have finished. This can be fairly significant. For one computation I've been working on lately, over 25% of the time is spent in the last 10% of each map/reduce operation (this has to do with the natural distribution of my input data and would be unavoidable even given an optimal partitioning). During this time, I have dozens of nodes sitting idle that could be executing the map part of the next job, if only the framework knew that is was coming. Has anyone dealt with this or found a good workaround?
Thanks! Chris
