Why not use a combiner? On Jul 30, 2012, at 7:59 PM, Mike S wrote:
> Liked asked several times, I need to merge my reducers output files. > Imagine I have many reducers which will generate 200 files. Now to > merge them together, I have written another map reduce job where each > mapper read a complete file in full in memory, and output that and > then only one reducer has to merge them together. To do so, I had to > write a custom fileinputreader that reads the complete file into > memory and then another custom fileoutputfileformat to append the each > reducer item bytes together. this how my mapper and reducers looks > like > > public static class MapClass extends Mapper<NullWritable, > BytesWritable, IntWritable, BytesWritable> > { > @Override > public void map(NullWritable key, BytesWritable value, Context > context) throws IOException, InterruptedException > { > context.write(key, value); > } > } > > public static class Reduce extends Reducer<NullWritable, > BytesWritable, NullWritable, BytesWritable> > { > @Override > public void reduce(NullWritable key, Iterable<BytesWritable> > values, > Context context) throws IOException, InterruptedException > { > for (BytesWritable value : values) > { > context.write(NullWritable.get(), value); > } > } > } > > I still have to have one reducers and that is a bottle neck. Please > note that I must do this merging as the users of my MR job are outside > my hadoop environment and the result as one file. > > Is there better way to merge reducers output files? >