[ https://issues.apache.org/jira/browse/CRUNCH-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Vargo updated CRUNCH-528: --------------------------------- Summary: Pair: Integer overflow during comparison can cause inconsistent sort. (was: Pair: Integer overflow during comparison cause inconsistent sort.) > Pair: Integer overflow during comparison can cause inconsistent sort. > --------------------------------------------------------------------- > > Key: CRUNCH-528 > URL: https://issues.apache.org/jira/browse/CRUNCH-528 > Project: Crunch > Issue Type: Bug > Components: Core > Reporter: Brandon Vargo > Assignee: Josh Wills > Priority: Minor > Attachments: 0001-Pair-Fix-comparison-for-large-hash-codes.patch > > > Pair uses the hash code of each value for comparison if the values are not > themselves comparable. If the hash code values are too large, then the values > will wrap when doing subtraction. This results in a comparison function that > is not transitive. > Among other things, this makes Joins using the in-memory pipeline not work, > since the in-memory shuffler uses a TreeMap if the key type is Comparable. > Since the key in a join is a Pair of the original key and a join tag, the key > is always comparable. With a non-transitive comparison function, it is > possible for the two join tags of the original key to sort differently, > resulting in the two join tags not being adjacent for the original key. This > results either in either the cross product erroneously producing no values in > the case of an inner join, since the two join tags are not adjacent, or null > values appearing when they should not in the case of an outer join. > As a workaround, ensure that the key used in a Join is comparable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)