[ https://issues.apache.org/jira/browse/CRUNCH-458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093788#comment-14093788 ]
Gabriel Reid commented on CRUNCH-458: ------------------------------------- Minor issue with that patch, the PCOL_CMP isn't transitive due to possible integer overflow on the hashCode comparison. I think it would need to be something like this: {code} if (left.hashCode() != right.hashCode()) { return left.hashCode() < right.hashCode() ? -1 : 1; } {code} +1 apart from that > Eliminate potentially random MR split-point decisions > ----------------------------------------------------- > > Key: CRUNCH-458 > URL: https://issues.apache.org/jira/browse/CRUNCH-458 > Project: Crunch > Issue Type: Bug > Reporter: Josh Wills > Attachments: CRUNCH-458.patch, CRUNCH-458b.patch > > > I'm running into a pipeline in which the decision of where to split two > dependent jobs seems to be random from run-to-run (I only noticed it b/c one > of the runs causes the pipeline to throw an NPE, and the other does not.) I'd > like to investigate this and try to eliminate any potential sources of > randomness in the way that two dependent GBK operations are split. -- This message was sent by Atlassian JIRA (v6.2#6252)