[
https://issues.apache.org/jira/browse/BEAM-6812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812247#comment-16812247
]
Jozef Vilcek commented on BEAM-6812:
------------------------------------
I was hit by this when upgraded from Beam 2.8.0 to 2.10.0. Thanks for the fix.
> Convert keys to ByteArray in Combine.perKey for Spark
> -----------------------------------------------------
>
> Key: BEAM-6812
> URL: https://issues.apache.org/jira/browse/BEAM-6812
> Project: Beam
> Issue Type: Bug
> Components: runner-spark
> Affects Versions: 2.10.0
> Reporter: Ankit Jhalaria
> Assignee: Ankit Jhalaria
> Priority: Critical
> Fix For: 2.12.0
>
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> * During calls to Combine.perKey, we want they keys used to have consistent
> hashCode when invoked from different JVM's.
> * However, while testing this in our company we found out that when using
> protobuf as keys during combine, the hashCodes can be different for the same
> key when invoked from different JVMs. This results in duplicates.
> * `ByteArray` class in Spark has a stable has code when dealing with arrays
> as well.
> * GroupByKey correctly converts keys to `ByteArray` and uses coders for
> serialization.
> * The fix does something similar when dealing with combines.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)