[
https://issues.apache.org/jira/browse/BEAM-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Moravek resolved BEAM-6812.
---------------------------------
Resolution: Fixed
Fix Version/s: 2.12.0
> Convert keys to ByteArray in Combine.perKey for Spark
> -----------------------------------------------------
>
> Key: BEAM-6812
> URL: https://issues.apache.org/jira/browse/BEAM-6812
> Project: Beam
> Issue Type: Bug
> Components: runner-spark
> Reporter: Ankit Jhalaria
> Assignee: Ankit Jhalaria
> Priority: Critical
> Fix For: 2.12.0
>
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> * During calls to Combine.perKey, we want they keys used to have consistent
> hashCode when invoked from different JVM's.
> * However, while testing this in our company we found out that when using
> protobuf as keys during combine, the hashCodes can be different for the same
> key when invoked from different JVMs. This results in duplicates.
> * `ByteArray` class in Spark has a stable has code when dealing with arrays
> as well.
> * GroupByKey correctly converts keys to `ByteArray` and uses coders for
> serialization.
> * The fix does something similar when dealing with combines.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)