Does anyone have a suggestion for implementing an Job using the o.a.avro.mapred classes where it is necessary to maintain a key and (logical) value? For example, consider WordCount with a combiner. If two counts of the same word is seen, then the combiner would emit an avro record worth a count of two. This would no longer equal the record with a count of one and presuming in a separate map task that word was seen once, the partitioner might send it to a different reduce task. This would cause the word to appear twice in the reduce outputs with different counts. I'm considering sub-classing AvroKeyComparator to have it compare the datum of a field in the record rather than the datum itself, although this approach is necessarily job specific. Any other thoughts?
Thanks, Jacob Rideout
