Paul Sutter wrote:
it should be possible to have lots of tasks in the shuffle phase
(mostly, sitting around waiting for mappers to run), but only have
"about" one actual reduce phase running per cpu (or whatever works for
each of our apps) that gets enough memory for a sorter, does
substantial computation, etc.

Ah, now I see your point, although I don't see how this would improve overall throughput. In most cases, the optimal configuration is for the total number of reduce tasks to be roughly the total number of reduces that can run at once. So there is no queue of waiting reduce tasks to schedule.

Doug

Reply via email to