I understand now. And looks like the job will print the min value instead of
max value per my test. In the stdout I can see the following data: 3 is the
year (I fake the data by myself), 99 is the max, and 0 is the min. We can see
for year 3, there are 100 records. So the inside a group, the key could be
different, and
context.write(key, NullWritable.get()) will write the LAST key to the output,
since the temperature is order desc, so the last key has the min temperature
3 99
........
3 0
number of records for this group 100
-----------------biggest key is--------------------------
3 0
public void reduce(IntPair key, Iterable<NullWritable> values,
Context context
) throws IOException, InterruptedException {
int count=0;
for (NullWritable iw:values) {
count++;
System.out.print(key.getFirst());
System.out.print(' ');
System.out.println(key.getSecond());
}
System.out.println("number of records for this group
"+Integer.toString(count));
System.out.println("-----------------biggest key
is--------------------------");
System.out.print(key.getFirst());
System.out.print(' ');
System.out.println(key.getSecond());
context.write(key, NullWritable.get());
}
At 2011-08-03 11:41:23,"Daniel,Wu" <[email protected]> wrote:
>or I should ask, should the input of the reducer for the group of year 1900 be
>like
>key, value pair
>(1900,35), null
>(1900,34),null
>(1900,33),null
>
>
>or like
>(1900,35), null
>(1900,35), null ==> since (1900,34) is for the same group as (1900,35), so
>it use (1900,35) as the key.
>(1900,35), null
>
>
>At 2011-08-03 10:35:51,"Daniel,Wu" <[email protected]> wrote:
>>
>>So the key of a group is determined by the first coming record in the group,
>>if we have 3 records in a group
>>1: (1900,35)
>>2:(1900,34)
>>3:(1900,33)
>>
>>if (1900,35) comes in as the first row, then the result key will be
>>(1900,35), when the second row (1900,34) comes in, it won't the impact the
>>key of the group, meaning it will not overwrite the key (1900,35) to
>>(1900,34), correct.
>>
>>>in the KeyComparator, these are guaranteed to come in reverse order in the
>>>>second slot. That is, if 35 is the maximum temperature then (1900,35) will
>>>>come before ANY other (1900,t). Then as the GroupComparator does its
>>>>thing, any time (1900,t) comes up it gets compared AND FOUND EQUAL TO
>>>>(1900,35), and thus its (null) value is added to the (1900,35) group. >
>>>>The reducer then gets a (1900,35) key with an Iterable of null values,
>>>>which it pretty much discards and just emits the key, which contains the
>>>>maximum value.