[ https://issues.apache.org/jira/browse/AVRO-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855209#action_12855209 ]
Doug Cutting commented on AVRO-513: ----------------------------------- Oops. This is harder than I thought. With Avro data in the key, and nulls in the value and a grouping comparator that always returns equals, Hadoop will call reduce once with the first key and an iterator over all of the null values. But we need to see each of the keys. Sigh. Perhaps Avro's reduce could be run in a separate thread that reads from a queue fed by Hadoop's reduce? > java mapreduce api should pass iterator of matching objects to reduce > --------------------------------------------------------------------- > > Key: AVRO-513 > URL: https://issues.apache.org/jira/browse/AVRO-513 > Project: Avro > Issue Type: Improvement > Components: java > Reporter: Doug Cutting > Assignee: Doug Cutting > > The Java mapreduce API added in AVRO-493 requires reducers implementations to > explicitly detect sequences of matching data. > Rather the reduce method might better look something like: > void reduce(Iterator<IN>, Collector<OUT>); > Where all equal values are passed in a single call. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.