On 06/15/2010 04:54 PM, Jacob R Rideout wrote:
Does anyone have a suggestion for implementing an Job using the o.a.avro.mapred classes where it is necessary to maintain a key and (logical) value? For example, consider WordCount with a combiner. If two counts of the same word is seen, then the combiner would emit an avro record worth a count of two. This would no longer equal the record with a count of one and presuming in a separate map task that word was seen once, the partitioner might send it to a different reduce task. This would cause the word to appear twice in the reduce outputs with different counts.
The WordCount record specifies that the count should be ignored by ordering. Avro's hashCode and compareTo implementations respect this, so that WordCounts for the same word but with different counts are all sent to the same partition.
Doug
