Re:Re:Re:Re:Re: one quesiton in the book of "hadoop:definitive guide 2 edition"

Daniel,Wu Thu, 04 Aug 2011 17:50:38 -0700

Hi John,

Another finding, if I remove the loop of values ( remove for (NullWritable 
iw:values)), then the result is the MAX temperature for each year.  and the 
original test I did return the MIN temperature for each year. The book also 
mentioned the value if mutable, I think the key might also be mutable, means as 
we loop each value in iterable<NullWritable>, the content of the key object is 
reset. Since the input is in order, so if we don't do any loop (as in the new 
test), the the key got at the end of reduce function is the first record in the 
group, which has the max value. If we loop each value in the value list, say 
loop 100 times, the context of the key will also change 100 times, and the key 
got at the end of the reduce function will be the last key, which has the MIN 
value. This theory of immutable Key can explain how to test works.Just need to 
figure out why each loop in the statement for (NullWritable iw:values) can 
change the content of the key. If any one know this, pleas
 e help tell me.


    public void reduce(IntPair key, Iterable<NullWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int count=0;
      /*for (NullWritable iw:values) {
            count++;
            System.out.print(key.getFirst());
            System.out.print(" : ");
            System.out.println(key.getSecond());
       }*/
    //  System.out.println("number of records for this group 
"+Integer.toString(count));
      System.out.println("-----------------biggest key 
is--------------------------");
      System.out.print(key.getFirst());
      System.out.print("   -----    ");
      System.out.println(key.getSecond());
      context.write(key, NullWritable.get());
     }
   }


-----------------biggest key is--------------------------
0   -----    97
-----------------biggest key is--------------------------
4   -----    99
-----------------biggest key is--------------------------
8   -----    99
-----------------biggest key is--------------------------
12   -----    97
-----------------biggest key is--------------------------
16   -----    98



At 2011-08-04 20:51:01,"John Armstrong" <[email protected]> wrote:
>On Thu, 4 Aug 2011 14:07:12 +0800 (CST), "Daniel,Wu" <[email protected]>
>wrote:
>> I am using the new API (released is from cloudera).  We can see from the
>> output, for each call of reduce function, 100 records were processed, 
>but
>> as the reduce is defined as
>> reduce(IntPair key, Iterable<NullWritable> values, Context context),  so
>> key should be fixed (not change) during every single execution, but the
>> strange thing is that for each loop of Iterable<NullWritable> values, 
>the
>> key is different!!!!!!.  Using your explanation,  the same information
>> (0:97)should be repeated 100 times, but actually it is 0:97, 0:97,
>0:96...
>> 0:0 as below
>
>Ah, but they're NOT different! That's the whole point!
>
>Think carefully: how does Hadoop decide what keys are "the same" when
>sorting and grouping reducer inputs?  It uses a comparator.  If the
>comparator says compare(key1,key2)==0, then as far as Hadoop is concerned
>the keys are the same.
>
>So here the comparator only really checks the first int in the pair:
>
>"compare(0:97,0:96)?  well let's compare 0 and 0...
>Integer.compare(0,0)==0, so these are the same key."
>
>You have to be careful about the semantics of "equality" whenever you're
>using nonstandard comparators.

Re:Re:Re:Re:Re: one quesiton in the book of "hadoop:definitive guide 2 edition"

Reply via email to