[
https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17083606#comment-17083606
]
Alexandre Archambault commented on SPARK-2620:
----------------------------------------------
FYI, adding "final" in front of "case class P(name: String)" fixes that.
{{final case class P(name: String)}}
Beware that this works thanks to a bug, described around
[https://github.com/scala/bug/issues/4440#issuecomment-365858185] in
particular, that makes final case classes ignore their outer reference in
equals. It's still not "fixed" as of scala 2.12.10 / 2.13.1, so adding final
still works as a workaround.
[https://github.com/scala/bug/issues/11940] discusses maybe allowing to tweak
outer references comparisons, which could offer a way to properly fix the issue
here. (One could add an equals method to the outer wrapper, if
[https://github.com/scala/bug/issues/11940] gets fixed, fixing the issue here.)
> case class cannot be used as key for reduce
> -------------------------------------------
>
> Key: SPARK-2620
> URL: https://issues.apache.org/jira/browse/SPARK-2620
> Project: Spark
> Issue Type: Bug
> Components: Spark Shell
> Affects Versions: 1.0.0, 1.1.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 2.0.0, 2.1.0,
> 2.2.0, 2.3.0
> Environment: reproduced on spark-shell local[4]
> Reporter: Gerard Maas
> Assignee: Tobias Schlatter
> Priority: Critical
> Labels: bulk-closed, case-class, core
>
> Using a case class as a key doesn't seem to work properly on Spark 1.0.0
> A minimal example:
> case class P(name:String)
> val ps = Array(P("alice"), P("bob"), P("charly"), P("bob"))
> sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect
> [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1),
> (P(bob),1), (P(abe),1), (P(charly),1))
> In contrast to the expected behavior, that should be equivalent to:
> sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect
> Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2))
> groupByKey and distinct also present the same behavior.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]