spark git commit: [SPARK-17863][SQL] should not add column into Distinct

yhuai Fri, 14 Oct 2016 14:45:47 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 d7fa3e324 -> c53b83749



[SPARK-17863][SQL] should not add column into Distinct

## What changes were proposed in this pull request?

We are trying to resolve the attribute in sort by pulling up some column for 
grandchild into child, but that's wrong when the child is Distinct, because the 
added column will change the behavior of Distinct, we should not do that.

## How was this patch tested?

Added regression test.

Author: Davies Liu <[email protected]>

Closes #15489 from davies/order_distinct.

(cherry picked from commit da9aeb0fde589f7c21c2f4a32036a68c0353965d)
Signed-off-by: Yin Huai <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c53b8374
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c53b8374
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c53b8374

Branch: refs/heads/branch-2.0
Commit: c53b8374911e801ed98c1436c384f0aef076eaab
Parents: d7fa3e3
Author: Davies Liu <[email protected]>
Authored: Fri Oct 14 14:45:20 2016 -0700
Committer: Yin Huai <[email protected]>
Committed: Fri Oct 14 14:45:29 2016 -0700

----------------------------------------------------------------------
 .../spark/sql/catalyst/analysis/Analyzer.scala  |  2 ++
 .../org/apache/spark/sql/SQLQuerySuite.scala    | 24 ++++++++++++++++++++
 2 files changed, 26 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/c53b8374/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
----------------------------------------------------------------------
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 3e4c769..617f3e0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -838,6 +838,8 @@ class Analyzer(
           // attributes that its child might have or could have.
           val missing = missingAttrs -- g.child.outputSet
           g.copy(join = true, child = addMissingAttr(g.child, missing))
+        case d: Distinct =>
+          throw new AnalysisException(s"Can't add $missingAttrs to $d")
         case u: UnaryNode =>
           u.withNewChildren(addMissingAttr(u.child, missingAttrs) :: Nil)
         case other =>

http://git-wip-us.apache.org/repos/asf/spark/blob/c53b8374/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
index cf25097..3684135 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@@ -1096,6 +1096,30 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
     )
   }
 
+  test("SPARK-17863: SELECT distinct does not work correctly if order by 
missing attribute") {
+    checkAnswer(
+      sql("""select distinct struct.a, struct.b
+          |from (
+          |  select named_struct('a', 1, 'b', 2, 'c', 3) as struct
+          |  union all
+          |  select named_struct('a', 1, 'b', 2, 'c', 4) as struct) tmp
+          |order by a, b
+          |""".stripMargin),
+      Row(1, 2) :: Nil)
+
+    val error = intercept[AnalysisException] {
+      sql("""select distinct struct.a, struct.b
+            |from (
+            |  select named_struct('a', 1, 'b', 2, 'c', 3) as struct
+            |  union all
+            |  select named_struct('a', 1, 'b', 2, 'c', 4) as struct) tmp
+            |order by struct.a, struct.b
+            |""".stripMargin)
+    }
+    assert(error.message contains "cannot resolve '`struct.a`' given input 
columns: [a, b]")
+
+  }
+
   test("cast boolean to string") {
     // TODO Ensure true/false string letter casing is consistent with Hive in 
all cases.
     checkAnswer(


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-17863][SQL] should not add column into Distinct

Reply via email to