Re: Splitting output of MapReduce according to file size

Arun C Murthy Sat, 10 Nov 2007 12:11:39 -0800

On Sat, Nov 10, 2007 at 07:56:22PM +0000, Holger Stenzhorn wrote:
>Hello,
>
>For testing purposes I am running Hapoop in local mode.
>Is there a possibility to split the output (TextOutputFormat) of a 
>MapReduce job into several output files (e.g. "part-0000", "part-0001", 
>etc.) according to some maximal file size per file?


I'd say the easiest way is to do the splitting as a post-processing step after 
your job...

You could run your job with multiple reduces to get multiple files (each reduce 
has one output). Depending on your Partitioner you can control how much data 
each reducer is input. (see org.apache.hadoop.mapred.Partitioner javadoc)

hth,
Arun

>I.e. is there a setting such a file size that can be set in the 
>hadoop-site.xml for example?
>Even through reading the documentation and mailing list I did not find a 
>simple solution...  I really appreaciate your help!
>
>Cheers,
>Holger

Re: Splitting output of MapReduce according to file size

Reply via email to