On 06/15/2010 04:54 PM, Jacob R Rideout wrote:
Does anyone have a suggestion for implementing an Job using the
o.a.avro.mapred classes where it is necessary to maintain a key and
(logical) value? For example, consider WordCount with a combiner. If
two counts of the same word is seen, then the combiner would emit an
avro record worth a count of two. This would no longer equal the
record with a count of one and presuming in a separate map task that
word was seen once, the partitioner might send it to a different
reduce task. This would cause the word to appear twice in the reduce
outputs with different counts.

The WordCount record specifies that the count should be ignored by ordering. Avro's hashCode and compareTo implementations respect this, so that WordCounts for the same word but with different counts are all sent to the same partition.

Doug

Reply via email to