I would consider this to be a very delicate optimization with little utility in the real world. It is very, very rare to reliably know how many records the reducer will see. Getting this wrong would be a disaster. Getting it right would be very difficult in almost all cases.
Moreover, this assumption is baked all through the map-reduce design and thus doing a change to allow reduce to go ahead is likely to be really tricky (not that I know this for a fact). On Mon, Jul 6, 2009 at 11:14 AM, Naresh Rapolu <nareshreddy.rap...@gmail.com > wrote: > My aim is to make the reduce move ahead with reduction as and when it gets > the data required, instead of waiting for all the maps to complete. If it > knows how many records it needs and compares it with number of records it > has got until now, it can move on once they become equal without waiting > for all the maps to finish. >