Re: Number of Reduce Outputs

Frédéric Bertin Tue, 29 Aug 2006 10:54:12 -0700

I asked me the same question when I stepped into Hadoop, and I thinkit's a good candidate for FAQ ;)

Generally speaking, IMO there is a need in Hadoop (MapReduce part) forsome kind of JobListener interface, allowing to write custom callbackscalled at strategic moments of a Job's life, and executed on a singlemachine.

Dennis's problem could then be solved using a MergeOutputFilesListener.

This could also allow to do more complex things like notifying people ofjobs' results by mail, etc... but this kind of example may be outsideHadoop's scope. However just publishing the listener interface wouldcontribute to make Hadoop more pluggable, and allow people to contributeuseful extensions, even if they are not focused on Hadoop's core.


WDYT?

Fred


Doug Cutting wrote:

To generate a single output file, specify just a single reduce task.If your reducer isn't doing much computation, then it might be fasterto do this in the original job, otherwise use a subsequent job.
Doug

Dennis Kubes wrote:
This is probably a simple question but when I run my MR job I amgetting 10 splits and therefore 10 output files like part-xxxxx. Isthere a way to merge those outputs into a single file using thecurrently running MR job or do I need to run another MR job to mergethem?
Dennis Kubes

Re: Number of Reduce Outputs

Reply via email to