we usually use something like values.next() to loop every rows in a specific
group, but I didn't see any code to loop the list, at least it need to get the
first row in the list, which is something like
values.get().
or will NullWritable.get() get the first row in the group?
static class MaxTemperatureReducer extends MapReduceBase
implements Reducer<IntPair, NullWritable, IntPair, NullWritable> {
public void reduce(IntPair key, Iterator<NullWritable> values,
OutputCollector<IntPair, NullWritable> output, Reporter reporter)
throws IOException {
output.collect(key, NullWritable.get());
}
}
> If we group values in the reducer by the year part of the key,
>then we will see all the records for the same year in one reduce group.
>And since they are sorted by temperature in descending order, the first is
>the maximum temperature."
At 2011-08-02 21:34:57,"John Armstrong" <[email protected]> wrote:
>On Tue, 2 Aug 2011 21:25:47 +0800 (CST), "Daniel,Wu" <[email protected]>
>wrote:
>> at page 243:
>> Per my understanding, The reducer is supposed to output the first value
>> (the maximum) for each year. But I just don't know how it work.
>>
>> suppose we have the data
>> 1901 200
>> 1901 300
>> 1901 400
>>
>> Since group is done by the year, so we have only one group, but we have
>3
>> different key as the key is a combination of year and temperature. for
>the
>> reduce, the output should be key, list(value) pair, since we have 3
>key,
>> so we should output 3 rows, but since we have only one group, we only
>> output 1 rows. So where is the conflict? Where do I misunderstand?
>
>Keep reading the section in the book:
>
>"This still isn't enough to achieve our coal, however. A partitioner
>ensures only that one reducer receives all the records for a year; it
>doesn't change the fact that the reducer groups by key within the
>partition... The final piece of the puzzle is the setting to control the
>grouping. If we group values in the reducer by the year part of the key,
>then we will see all the records for the same year in one reduce group.
>And since they are sorted by temperature in descending order, the first is
>the maximum temperature."
>
>That is, in that example they also change the way the reducer groups its
>inputs.