Re: Merge Reducers Output

Michael Segel Mon, 30 Jul 2012 19:11:48 -0700

Why not use a combiner?

On Jul 30, 2012, at 7:59 PM, Mike S wrote:


> Liked asked several times, I need to merge my reducers output files.
> Imagine I have many reducers which will generate 200 files. Now to
> merge them together, I have written another map reduce job where each
> mapper read a complete file in full in memory, and output that and
> then only one reducer has to merge them together. To do so, I had to
> write a custom fileinputreader that reads the complete file into
> memory and then another custom fileoutputfileformat to append the each
> reducer item bytes together. this how my mapper and reducers looks
> like
> 
> public static class MapClass extends Mapper<NullWritable,
> BytesWritable, IntWritable, BytesWritable>
>       {
>               @Override
>               public void map(NullWritable key, BytesWritable value, Context
> context) throws IOException, InterruptedException
>               {
>                       context.write(key, value);
>               }
>       }
> 
>       public static class Reduce extends Reducer<NullWritable,
> BytesWritable, NullWritable, BytesWritable>
>       {
>               @Override
>               public void reduce(NullWritable key, Iterable<BytesWritable> 
> values,
> Context context) throws IOException, InterruptedException
>               {
>                       for (BytesWritable value : values)
>                       {
>                               context.write(NullWritable.get(), value);
>                       }
>               }
>       }
> 
> I still have to have one reducers and that is a bottle neck. Please
> note that I must do this merging as the users of my MR job are outside
> my hadoop environment and the result as one file.
> 
> Is there better way to merge reducers output files?
>

Re: Merge Reducers Output

Reply via email to