[
https://issues.apache.org/jira/browse/SPARK-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025631#comment-17025631
]
Kaspar Fischer commented on SPARK-3847:
---------------------------------------
This issue is still present in Spark 2.4.0. The PR mentioned above didn’t
actually result in a code change that got committed.
> Enum.hashCode is only consistent within the same JVM
> ----------------------------------------------------
>
> Key: SPARK-3847
> URL: https://issues.apache.org/jira/browse/SPARK-3847
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.1.0
> Environment: Oracle JDK 7u51 64bit on Ubuntu 12.04
> Reporter: Nathan Bijnens
> Priority: Major
> Labels: bulk-closed, enum
>
> When using java Enum's as key in some operations the results will be very
> unexpected. The issue is that the Java Enum.hashCode returns the
> memoryposition, which is different on each JVM.
> {code}
> messages.filter(_.getHeader.getKind == Kind.EVENT).count
> >> 503650
> val tmp = messages.filter(_.getHeader.getKind == Kind.EVENT)
> tmp.map(_.getHeader.getKind).countByValue
> >> Map(EVENT -> 1389)
> {code}
> Because it's actually a JVM issue we either should reject with an error enums
> as key or implement a workaround.
> A good writeup of the issue can be found here (and a workaround):
> http://dev.bizo.com/2014/02/beware-enums-in-spark.html
> Somewhat more on the hash codes and Enum's:
> https://stackoverflow.com/questions/4885095/what-is-the-reason-behind-enum-hashcode
> And some issues (most of them rejected) at the Oracle Bug Java database:
> - http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8050217
> - http://bugs.java.com/bugdatabase/view_bug.do?bug_id=7190798
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]