On 06/18/2012 10:19 AM, Mark Kerzner wrote:
If only reducers could be told to start their work on the first maps that they see, my processing would begin to show results much earlier, before all the mappers are done.
The sort/shuffle phase isn't just about ordering the keys, it's about collecting all the results of the map phase that share a key together for the reducers to work on. If your reducer can operate on mapper outputs independently of each other, then it sounds like it's really another mapper and should be either factored into the mapper or rewritten as a mapper on its own and both mappers thrown into the ChainMapper (if you're using the older API).