[ 
https://issues.apache.org/jira/browse/SPARK-9785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-9785:
------------------------------
    Summary: HashPartitioning compatibility should consider expression ordering 
 (was: HashPartitioning compatibility should be sensitive to expression 
ordering)

> HashPartitioning compatibility should consider expression ordering
> ------------------------------------------------------------------
>
>                 Key: SPARK-9785
>                 URL: https://issues.apache.org/jira/browse/SPARK-9785
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>            Priority: Blocker
>
> HashPartitioning compatibility is defined w.r.t the _set_ of expressions, but 
> in other contexts the ordering of those expressions matters.  This is 
> illustrated by the following regression test:
> {code}
>   test("HashPartitioning compatibility") {
>     val expressions = Seq(Literal(2), Literal(3))
>     // Consider two HashPartitionings that have the same _set_ of hash 
> expressions but which are
>     // created with different orderings of those expressions:
>     val partitioningA = HashPartitioning(expressions, 100)
>     val partitioningB = HashPartitioning(expressions.reverse, 100)
>     // These partitionings are not considered equal:
>     assert(partitioningA != partitioningB)
>     // However, they both satisfy the same clustered distribution:
>     val distribution = ClusteredDistribution(expressions)
>     assert(partitioningA.satisfies(distribution))
>     assert(partitioningB.satisfies(distribution))
>     // Both partitionings are compatible with and guarantee each other:
>     assert(partitioningA.compatibleWith(partitioningB))
>     assert(partitioningB.compatibleWith(partitioningA))
>     assert(partitioningA.guarantees(partitioningB))
>     assert(partitioningB.guarantees(partitioningA))
>     // Given all of this, we would expect these partitionings to compute the 
> same hashcode for
>     // any given row:
>     def computeHashCode(partitioning: HashPartitioning): Int = {
>       val hashExprProj = new 
> InterpretedMutableProjection(partitioning.expressions, Seq.empty)
>       hashExprProj.apply(InternalRow.empty).hashCode()
>     }
>     assert(computeHashCode(partitioningA) === computeHashCode(partitioningB))
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to