Examples of chained MapReduce?

James Kennedy Fri, 22 Jun 2007 10:41:49 -0700

I was wondering if anyone knows of any examples of truly chained, trulydistributed MapReduce jobs.

So far what I've had trouble finding examples of MapReduce jobs that arekicked-off by some one time process that in turn kick off otherMapReduce jobs long after the initial driver process is dead. Thiswould be more distributed and fault tolerant since it removes dependencyon a driver process.

I looked at the Nutch crawl code for example which iteratively builds upa url db using successive MapReduces up to a certain depth. But thisall done from within a for loop of a single process even though eachindividual MapReduce is distributed.

Also, I notice that both Google and Hadoop's example of the distributedsort fails to deal with the fact that the result is multiple sortedfiles... this isn't a complete sort since the output files still need tobe merge-sorted don't they? To complete the algorithm, could theReducer kick of a subsequent merge sort MapReduce on the result files?Or maybe there's something I'm not understanding...

Examples of chained MapReduce?

Reply via email to