or I should ask, should the input of the reducer for the group of year 1900 be like key, value pair (1900,35), null (1900,34),null (1900,33),null
or like (1900,35), null (1900,35), null ==> since (1900,34) is for the same group as (1900,35), so it use (1900,35) as the key. (1900,35), null At 2011-08-03 10:35:51,"Daniel,Wu" <[email protected]> wrote: > >So the key of a group is determined by the first coming record in the group, >if we have 3 records in a group >1: (1900,35) >2:(1900,34) >3:(1900,33) > >if (1900,35) comes in as the first row, then the result key will be (1900,35), >when the second row (1900,34) comes in, it won't the impact the key of the >group, meaning it will not overwrite the key (1900,35) to (1900,34), correct. > >>in the KeyComparator, these are guaranteed to come in reverse order in the >>>second slot. That is, if 35 is the maximum temperature then (1900,35) will >>>come before ANY other (1900,t). Then as the GroupComparator does its >>>thing, any time (1900,t) comes up it gets compared AND FOUND EQUAL TO >>>(1900,35), and thus its (null) value is added to the (1900,35) group. > >The >>reducer then gets a (1900,35) key with an Iterable of null values, >which it >>pretty much discards and just emits the key, which contains the >maximum >>value.
