[ https://issues.apache.org/jira/browse/CRUNCH-437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Wills resolved CRUNCH-437. ------------------------------- Resolution: Fixed Fix Version/s: 0.11.0 0.8.4 Fixed in 0.8 and master. > Fix Crunch Spark duplicate value aggregation > -------------------------------------------- > > Key: CRUNCH-437 > URL: https://issues.apache.org/jira/browse/CRUNCH-437 > Project: Crunch > Issue Type: Bug > Reporter: Josh Wills > Fix For: 0.8.4, 0.11.0 > > Attachments: CRUNCH-437.patch > > > The current Crunch-on-Spark mapside combiner uses a Multimap of key-value > pairs to cache values for local aggregation. This is awesome, except it means > that identical key-value outputs before a shuffle will only have one copy in > the Multimap, which means that the aggregation counts may not be correct. We > should fix it to use a proper Map<K, List<V>> to ensure that duplicate values > are aggregated correctly. -- This message was sent by Atlassian JIRA (v6.2#6252)