spark git commit: [SPARK-12077][SQL] change the default plan for single distinct

yhuai Tue, 01 Dec 2015 20:17:44 -0800

Repository: spark
Updated Branches:
  refs/heads/master d96f8c997 -> 96691feae



[SPARK-12077][SQL] change the default plan for single distinct

Use try to match the behavior for single distinct aggregation with Spark 1.5, 
but that's not scalable, we should be robust by default, have a flag to address 
performance regression for low cardinality aggregation.

cc yhuai nongli

Author: Davies Liu <[email protected]>

Closes #10075 from davies/agg_15.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/96691fea
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/96691fea
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/96691fea

Branch: refs/heads/master
Commit: 96691feae0229fd693c29475620be2c4059dd080
Parents: d96f8c9
Author: Davies Liu <[email protected]>
Authored: Tue Dec 1 20:17:12 2015 -0800
Committer: Yin Huai <[email protected]>
Committed: Tue Dec 1 20:17:12 2015 -0800

----------------------------------------------------------------------
 sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala       | 2 +-
 .../test/scala/org/apache/spark/sql/execution/PlannerSuite.scala | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/96691fea/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
index 5ef3a48..58adf64 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
@@ -451,7 +451,7 @@ private[spark] object SQLConf {
 
   val SPECIALIZE_SINGLE_DISTINCT_AGG_PLANNING =
     booleanConf("spark.sql.specializeSingleDistinctAggPlanning",
-      defaultValue = Some(true),
+      defaultValue = Some(false),
       isPublic = false,
       doc = "When true, if a query only has a single distinct column and it 
has " +
         "grouping expressions, we will use our planner rule to handle this 
distinct " +

http://git-wip-us.apache.org/repos/asf/spark/blob/96691fea/sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala
----------------------------------------------------------------------
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala
index dfec139..a462625 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala
@@ -44,10 +44,10 @@ class PlannerSuite extends SharedSQLContext {
         fail(s"Could query play aggregation query $query. Is it an aggregation 
query?"))
     val aggregations = planned.collect { case n if n.nodeName contains 
"Aggregate" => n }
 
-    // For the new aggregation code path, there will be three aggregate 
operator for
+    // For the new aggregation code path, there will be four aggregate 
operator for
     // distinct aggregations.
     assert(
-      aggregations.size == 2 || aggregations.size == 3,
+      aggregations.size == 2 || aggregations.size == 4,
       s"The plan of query $query does not have partial aggregations.")
   }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-12077][SQL] change the default plan for single distinct

Reply via email to