Repository: spark
Updated Branches:
refs/heads/master f44999200 -> 962878843
[SPARK-11840][SQL] Restore the 1.5's behavior of planning a single distinct
aggregation.
The impact of this change is for a query that has a single distinct column and
does not have any grouping expression like
`SELECT COUNT(DISTINCT a) FROM table`
The plan will be changed from
```
AGG-2 (count distinct)
Shuffle to a single reducer
Partial-AGG-2 (count distinct)
AGG-1 (grouping on a)
Shuffle by a
Partial-AGG-1 (grouping on 1)
```
to the following one (1.5 uses this)
```
AGG-2
AGG-1 (grouping on a)
Shuffle to a single reducer
Partial-AGG-1(grouping on a)
```
The first plan is more robust. However, to better benchmark the impact of this
change, we should use 1.5's plan and use the conf of
`spark.sql.specializeSingleDistinctAggPlanning` to control the plan.
Author: Yin Huai <[email protected]>
Closes #9828 from yhuai/distinctRewriter.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/96287884
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/96287884
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/96287884
Branch: refs/heads/master
Commit: 962878843b611fa6229e3ee67bb22e2a4bc283cd
Parents: f449992
Author: Yin Huai <[email protected]>
Authored: Thu Nov 19 11:02:17 2015 -0800
Committer: Yin Huai <[email protected]>
Committed: Thu Nov 19 11:02:17 2015 -0800
----------------------------------------------------------------------
.../sql/catalyst/analysis/DistinctAggregationRewriter.scala | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/96287884/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala
----------------------------------------------------------------------
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala
index c0c9604..9c78f6d 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DistinctAggregationRewriter.scala
@@ -126,8 +126,8 @@ case class DistinctAggregationRewriter(conf: CatalystConf)
extends Rule[LogicalP
val shouldRewrite = if (conf.specializeSingleDistinctAggPlanning) {
// When the flag is set to specialize single distinct agg planning,
// we will rely on our Aggregation strategy to handle queries with a
single
- // distinct column and this aggregate operator does have grouping
expressions.
- distinctAggGroups.size > 1 || (distinctAggGroups.size == 1 &&
a.groupingExpressions.isEmpty)
+ // distinct column.
+ distinctAggGroups.size > 1
} else {
distinctAggGroups.size >= 1
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]