I just reread my first post. Maybe I was not clear enough: It is only important to me that the Reduce tasks _start_ in a specified order based on their key. That is the only additional constraint I need.
On Mon, Dec 20, 2010 at 9:51 AM, Martin Becker <_martinbec...@web.de> wrote: > As far as I understood, MapReduce is waiting for all Mappers to finish > until it starts running Reduce tasks. Am I mistaken here? If I am not, > then I do not see any more synchrony being introduced than there > already is (no locks required). Of course I am not aware of all the > internals, but MapReduce is working with a single JobTracker, which > distributes Reduce tasks to the different nodes (see > http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Overview). > So the only point where my "theory" would break is, if Reducer start > before Mappers finish. Otherwise the JobTracker should be able to > schedule Reduce tasks in a specific order. > > On Mon, Dec 20, 2010 at 4:45 AM, Harsh J <qwertyman...@gmail.com> wrote: >> You could use sort of a distributed lock service to achieve this >> (ZooKeeper can help). But such things ought to be avoided as David >> pointed out above. >> >> On Sun, Dec 19, 2010 at 9:09 PM, Martin Becker <_martinbec...@web.de> wrote: >>> Hello everybody, >>> >>> is there a possibility to make sure that certain/all reduce tasks, >>> i.e. the reducers to certain keys, are executed in a specified order? >>> This is Job internal, so the Job Scheduler is probably the wrong place to >>> start? >>> Does the order induced by the Comparable interface influence the >>> execution order at all? >>> >>> Thanks in advance, >>> Martin >>> >> >> >> >> -- >> Harsh J >> www.harshj.com >> >