Liked asked several times, I need to merge my reducers output files.
Imagine I have many reducers which will generate 200 files. Now to
merge them together, I have written another map reduce job where each
mapper read a complete file in full in memory, and output that and
then only one reducer has to merge them together. To do so, I had to
write a custom fileinputreader that reads the complete file into
memory and then another custom fileoutputfileformat to append the each
reducer item bytes together. this how my mapper and reducers looks
like

public static class MapClass extends Mapper<NullWritable,
BytesWritable, IntWritable, BytesWritable>
        {
                @Override
                public void map(NullWritable key, BytesWritable value, Context
context) throws IOException, InterruptedException
                {
                        context.write(key, value);
                }
        }

        public static class Reduce extends Reducer<NullWritable,
BytesWritable, NullWritable, BytesWritable>
        {
                @Override
                public void reduce(NullWritable key, Iterable<BytesWritable> 
values,
Context context) throws IOException, InterruptedException
                {
                        for (BytesWritable value : values)
                        {
                                context.write(NullWritable.get(), value);
                        }
                }
        }

 I still have to have one reducers and that is a bottle neck. Please
note that I must do this merging as the users of my MR job are outside
my hadoop environment and the result as one file.

Is there better way to merge reducers output files?

Reply via email to