Re: Merge Reducers Output

Raj Vishwanathan Tue, 31 Jul 2012 08:11:13 -0700

Is there a requirement for the final reduce file to be sorted? If not, wouldn't 
a map only job ( +  a combiner, ) and a merge only job provide the answer?


Raj



>________________________________
> From: Michael Segel <michael_se...@hotmail.com>
>To: common-user@hadoop.apache.org 
>Sent: Tuesday, July 31, 2012 5:24 AM
>Subject: Re: Merge Reducers Output
> 
>You really don't want to run a single reducer unless you know that you don't 
>have a lot of mappers. 
>
>As long as the output data types and structure are the same as the input, you 
>can run your code as the combiner, and then run it again as the reducer. 
>Problem solved with one or two lines of code. 
>If your input and output don't match, then you can use the existing code as a 
>combiner, and then write a new reducer. It could as easily be an identity 
>reducer too. (Don't know the exact problem.) 
>
>So here's a silly question. Why wouldn't you want to run a combiner? 
>
>
>On Jul 31, 2012, at 12:08 AM, Jay Vyas <jayunit...@gmail.com> wrote:
>
>> Its not clear to me that you need custom input formats....
>> 
>> 1) Getmerge might work or
>> 
>> 2) Simply run a SINGLE reducer job (have mappers output static final int
>> key=1, or specify numReducers=1).
>> 
>> In this case, only one reducer will be called, and it will read through all
>> the values.
>> 
>> On Tue, Jul 31, 2012 at 12:30 AM, Bejoy KS <bejoy.had...@gmail.com> wrote:
>> 
>>> Hi
>>> 
>>> Why not use 'hadoop fs -getMerge <outputFolderInHdfs>
>>> <targetFileNameInLfs>' while copying files out of hdfs for the end users to
>>> consume. This will merge all the files in 'outputFolderInHdfs'  into one
>>> file and put it in lfs.
>>> 
>>> Regards
>>> Bejoy KS
>>> 
>>> Sent from handheld, please excuse typos.
>>> 
>>> -----Original Message-----
>>> From: Michael Segel <michael_se...@hotmail.com>
>>> Date: Mon, 30 Jul 2012 21:08:22
>>> To: <common-user@hadoop.apache.org>
>>> Reply-To: common-user@hadoop.apache.org
>>> Subject: Re: Merge Reducers Output
>>> 
>>> Why not use a combiner?
>>> 
>>> On Jul 30, 2012, at 7:59 PM, Mike S wrote:
>>> 
>>>> Liked asked several times, I need to merge my reducers output files.
>>>> Imagine I have many reducers which will generate 200 files. Now to
>>>> merge them together, I have written another map reduce job where each
>>>> mapper read a complete file in full in memory, and output that and
>>>> then only one reducer has to merge them together. To do so, I had to
>>>> write a custom fileinputreader that reads the complete file into
>>>> memory and then another custom fileoutputfileformat to append the each
>>>> reducer item bytes together. this how my mapper and reducers looks
>>>> like
>>>> 
>>>> public static class MapClass extends Mapper<NullWritable,
>>>> BytesWritable, IntWritable, BytesWritable>
>>>>      {
>>>>              @Override
>>>>              public void map(NullWritable key, BytesWritable value,
>>> Context
>>>> context) throws IOException, InterruptedException
>>>>              {
>>>>                      context.write(key, value);
>>>>              }
>>>>      }
>>>> 
>>>>      public static class Reduce extends Reducer<NullWritable,
>>>> BytesWritable, NullWritable, BytesWritable>
>>>>      {
>>>>              @Override
>>>>              public void reduce(NullWritable key,
>>> Iterable<BytesWritable> values,
>>>> Context context) throws IOException, InterruptedException
>>>>              {
>>>>                      for (BytesWritable value : values)
>>>>                      {
>>>>                              context.write(NullWritable.get(), value);
>>>>                      }
>>>>              }
>>>>      }
>>>> 
>>>> I still have to have one reducers and that is a bottle neck. Please
>>>> note that I must do this merging as the users of my MR job are outside
>>>> my hadoop environment and the result as one file.
>>>> 
>>>> Is there better way to merge reducers output files?
>>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> Jay Vyas
>> MMSB/UCHC
>
>
>
>

Re: Merge Reducers Output

Reply via email to