Tofigh created SPARK-16325:
------------------------------

             Summary: reduceByKey requires an implicit ordering which it never 
uses
                 Key: SPARK-16325
                 URL: https://issues.apache.org/jira/browse/SPARK-16325
             Project: Spark
          Issue Type: Bug
            Reporter: Tofigh
            Priority: Minor


assume there is a case class as follows:

case class UnorderedPair[A](left: A, right: A) extends Serializable {

  override def equals(obj: Any): Boolean = obj match {
    case other: UnorderedPair[A] => (this.left == other.left && this.right == 
other.right) || (this.left == other.right && this.right == other.left)
    case _ => false
  }

  override def hashCode(): Int = left.hashCode() * right.hashCode()

  def toSeq(): Seq[A] = Seq(left, right)
}

and assume an RDD of UnorderedPair and Seq(Long):
val rdd = sc.parallelize(Seq( (UnorderedPair(12,14), Seq(123L)), 
(UnorderedPair(12,14), Seq(123L)) ))
then the following code:
rdd.reduceByKey(_ ++ _ )
throws an error that an implicit Ordering is required.

The dummy solution was to rewrite the case class as follows:

case class UnorderedPair[A](left: A, right: A) extends 
Ordered[UnorderedPair[A]] with Serializable {

  override def equals(obj: Any): Boolean = obj match {
    case other: UnorderedPair[A] => (this.left == other.left && this.right == 
other.right) || (this.left == other.right && this.right == other.left)
    case _ => false
  }

  override def hashCode(): Int = left.hashCode() * right.hashCode()

  def toSeq(): Seq[A] = Seq(left, right)

  override def compare(that: UnorderedPair[A]): Int = throw new Defect("This 
function should not be called. It is a workaround for a Spark bug in 
reduceByKey which requires an Ordering function.")
}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to