I asked me the same question when I stepped into Hadoop, and I think it's a good candidate for FAQ ;)

Generally speaking, IMO there is a need in Hadoop (MapReduce part) for some kind of JobListener interface, allowing to write custom callbacks called at strategic moments of a Job's life, and executed on a single machine.
Dennis's problem could then be solved using a MergeOutputFilesListener.

This could also allow to do more complex things like notifying people of jobs' results by mail, etc... but this kind of example may be outside Hadoop's scope. However just publishing the listener interface would contribute to make Hadoop more pluggable, and allow people to contribute useful extensions, even if they are not focused on Hadoop's core.

WDYT?

Fred


Doug Cutting wrote:
To generate a single output file, specify just a single reduce task. If your reducer isn't doing much computation, then it might be faster to do this in the original job, otherwise use a subsequent job.

Doug

Dennis Kubes wrote:
This is probably a simple question but when I run my MR job I am getting 10 splits and therefore 10 output files like part-xxxxx. Is there a way to merge those outputs into a single file using the currently running MR job or do I need to run another MR job to merge them?

Dennis Kubes

Reply via email to