[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282323#comment-14282323 ]
Frank Rosner commented on SPARK-2620: ------------------------------------- The issue is caused by the fact that pattern matching of case classes does not work in the Spark shell. Reducing by key (same as grouping, distinc operators, etc.) rely on equality checking of objects. Case classes implement equality checking by using pattern matching to perform type checking and conversion. When I implement a custom equals method for the case class that checks the type with {{isInstanceOf}} and casts with {{asInstanceOf}}, then {{==}} works and so do distinct and key-based operations. Does it make sense to break this down and rework this issue to cover the actual problem rather than a symptom? Are there any plans to fix this issue? It requires some extra effort when working with case classes and the REPL, especially blocking rapid data exploration and analytics. I will try to dig into the code and see whether I can find a solution but there seems to be an ongoing discussion for quite a while now. > case class cannot be used as key for reduce > ------------------------------------------- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Affects Versions: 1.0.0, 1.1.0 > Environment: reproduced on spark-shell local[4] > Reporter: Gerard Maas > Assignee: Tobias Schlatter > Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org