I asked me the same question when I stepped into Hadoop, and I think
it's a good candidate for FAQ ;)
Generally speaking, IMO there is a need in Hadoop (MapReduce part) for
some kind of JobListener interface, allowing to write custom callbacks
called at strategic moments of a Job's life, and executed on a single
machine.
Dennis's problem could then be solved using a MergeOutputFilesListener.
This could also allow to do more complex things like notifying people of
jobs' results by mail, etc... but this kind of example may be outside
Hadoop's scope. However just publishing the listener interface would
contribute to make Hadoop more pluggable, and allow people to contribute
useful extensions, even if they are not focused on Hadoop's core.
WDYT?
Fred
Doug Cutting wrote:
To generate a single output file, specify just a single reduce task.
If your reducer isn't doing much computation, then it might be faster
to do this in the original job, otherwise use a subsequent job.
Doug
Dennis Kubes wrote:
This is probably a simple question but when I run my MR job I am
getting 10 splits and therefore 10 output files like part-xxxxx. Is
there a way to merge those outputs into a single file using the
currently running MR job or do I need to run another MR job to merge
them?
Dennis Kubes