Number of Reduce Outputs

2006-08-29 Thread Dennis Kubes
This is probably a simple question but when I run my MR job I am getting 10 splits and therefore 10 output files like part-x. Is there a way to merge those outputs into a single file using the currently running MR job or do I need to run another MR job to merge them? Dennis Kubes

Re: Number of Reduce Outputs

2006-08-29 Thread Doug Cutting
To generate a single output file, specify just a single reduce task. If your reducer isn't doing much computation, then it might be faster to do this in the original job, otherwise use a subsequent job. Doug Dennis Kubes wrote: This is probably a simple question but when I run my MR job I am

Re: Number of Reduce Outputs

2006-08-29 Thread Frédéric Bertin
I asked me the same question when I stepped into Hadoop, and I think it's a good candidate for FAQ ;) Generally speaking, IMO there is a need in Hadoop (MapReduce part) for some kind of JobListener interface, allowing to write custom callbacks called at strategic moments of a Job's life, and