Re:Re:Re:Re: one quesiton in the book of "hadoop:definitive guide 2 edition"

Daniel,Wu Wed, 03 Aug 2011 23:08:12 -0700

Thanks John,

I am confused again by the result of my test case, could you please take a look:
The code related is:


  public static class IntSumReducer
       extends Reducer<IntPair,NullWritable,IntPair,NullWritable> {

    public void reduce(IntPair key, Iterable<NullWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int count=0;
      for (NullWritable iw:values) {
            count++;
            System.out.print(key.getFirst());
            System.out.print(" : ");
            System.out.println(key.getSecond());
       }
      System.out.println("number of records for this group 
"+Integer.toString(count));
      System.out.println("-----------------biggest key 
is--------------------------");
      System.out.print(key.getFirst());
      System.out.print("   -----    ");
      System.out.println(key.getSecond());
      context.write(key, NullWritable.get());
     }
   }

I am using the new API (released is from cloudera).  We can see from the 
output, for each call of reduce function, 100 records were processed,  but as 
the reduce is defined as
reduce(IntPair key, Iterable<NullWritable> values, Context context),  so key 
should be fixed (not change) during every single execution, but the strange 
thing is that for each loop of Iterable<NullWritable> values,  the key is 
different!!!!!!.  Using your explanation,  the same information (0:97)should be 
repeated 100 times, but actually it is 0:97, 0:97, 0:96... 0:0 as below


0 : 97
0 : 97
0 : 96
0 : 96
0 : 94
0 : 93
0 : 93
0 : 91
0 : 90
0 : 89
0 : 86
0 : 85
....   deleted to save space
0 : 2
0 : 1
0 : 1
0 : 0
0 : 0
number of records for this group 100
-----------------biggest key is--------------------------
0   -----    0
4 : 99
4 : 99
4 : 98
4 : 96
4 : 95
4 : 94
4 : 93
4 : 92
4 : 91
4 : 91
4 : 90





At 2011-08-03 20:02:34,"John Armstrong" <[email protected]> wrote:
>On Wed, 3 Aug 2011 10:35:51 +0800 (CST), "Daniel,Wu" <[email protected]>
>wrote:
>> So the key of a group is determined by the first coming record in the
>> group,  if we have 3 records in a group
>> 1: (1900,35)
>> 2:(1900,34)
>> 3:(1900,33)
>> 
>> if (1900,35) comes in as the first row, then the result key will be
>> (1900,35), when the second row (1900,34) comes in, it won't the impact
>the
>> key of the group, meaning it will not overwrite the key (1900,35) to
>> (1900,34), correct.
>
>Effectively, yes.  Remember that on the inside it's using the comparator
>something like this:
>
>(1900, 35).. do I have that key already? [searches collection of keys
>with, say, a BST] no! I'll add it here.
>(1900,34).. do I have that key already? [searches again, now getting a
>result of 0 when comparing to (1900,35)] yes! [it's not the same key, but
>according to the GroupComparator it is!] so I'll add its value to the key's
>iterable of values.
>etc.

Re:Re:Re:Re: one quesiton in the book of "hadoop:definitive guide 2 edition"

Reply via email to