Gerard Maas created SPARK-2620:
----------------------------------

             Summary: case class cannot be used as key for reduce
                 Key: SPARK-2620
                 URL: https://issues.apache.org/jira/browse/SPARK-2620
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.0.0
         Environment: reproduced on spark-shell local[4]
            Reporter: Gerard Maas
            Priority: Critical


Using a case class as a key doesn't seem to work properly on Spark 1.0.0

A minimal example:

case class P(name:String)
val ps = Array(P("alice"), P("bob"), P("charly"), P("bob"))
sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect
[Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), (P(bob),1), 
(P(abe),1), (P(charly),1))

In contrast to the expected behavior, that should be equivalent to:
sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect
Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2))

groupByKey and distinct also present the same behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to