Liked asked several times, I need to merge my reducers output files. Imagine I have many reducers which will generate 200 files. Now to merge them together, I have written another map reduce job where each mapper read a complete file in full in memory, and output that and then only one reducer has to merge them together. To do so, I had to write a custom fileinputreader that reads the complete file into memory and then another custom fileoutputfileformat to append the each reducer item bytes together. this how my mapper and reducers looks like
public static class MapClass extends Mapper<NullWritable, BytesWritable, IntWritable, BytesWritable> { @Override public void map(NullWritable key, BytesWritable value, Context context) throws IOException, InterruptedException { context.write(key, value); } } public static class Reduce extends Reducer<NullWritable, BytesWritable, NullWritable, BytesWritable> { @Override public void reduce(NullWritable key, Iterable<BytesWritable> values, Context context) throws IOException, InterruptedException { for (BytesWritable value : values) { context.write(NullWritable.get(), value); } } } I still have to have one reducers and that is a bottle neck. Please note that I must do this merging as the users of my MR job are outside my hadoop environment and the result as one file. Is there better way to merge reducers output files?