On Thu, 4 Aug 2011 14:07:12 +0800 (CST), "Daniel,Wu" <[email protected]>
wrote:
> I am using the new API (released is from cloudera).  We can see from the
> output, for each call of reduce function, 100 records were processed, 
but
> as the reduce is defined as
> reduce(IntPair key, Iterable<NullWritable> values, Context context),  so
> key should be fixed (not change) during every single execution, but the
> strange thing is that for each loop of Iterable<NullWritable> values, 
the
> key is different!!!!!!.  Using your explanation,  the same information
> (0:97)should be repeated 100 times, but actually it is 0:97, 0:97,
0:96...
> 0:0 as below

Ah, but they're NOT different! That's the whole point!

Think carefully: how does Hadoop decide what keys are "the same" when
sorting and grouping reducer inputs?  It uses a comparator.  If the
comparator says compare(key1,key2)==0, then as far as Hadoop is concerned
the keys are the same.

So here the comparator only really checks the first int in the pair:

"compare(0:97,0:96)?  well let's compare 0 and 0...
Integer.compare(0,0)==0, so these are the same key."

You have to be careful about the semantics of "equality" whenever you're
using nonstandard comparators.

Reply via email to