This is an automated email from the ASF dual-hosted git repository.
yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new e24f0dc [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate
functions in SQL references
e24f0dc is described below
commit e24f0dcd2754a1db27e9b0f3cf27ee6d7229f717
Author: Takeshi Yamamuro <[email protected]>
AuthorDate: Mon Apr 6 21:36:51 2020 +0900
[SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in
SQL references
### What changes were proposed in this pull request?
This PR intends to improve the SQL document of `GROUP BY`; it added the
description about FILTER clauses of aggregate functions.
### Why are the changes needed?
To improve the SQL documents
### Does this PR introduce any user-facing change?
Yes.
<img
src="https://user-images.githubusercontent.com/692303/78558612-e2234a80-784d-11ea-9353-b3feac4d57a7.png"
width="500">
### How was this patch tested?
Manually checked.
Closes #28134 from maropu/SPARK-31358.
Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
---
docs/sql-ref-syntax-qry-select-groupby.md | 44 +++++++++++++++++++++++++++++++
docs/sql-ref-syntax-qry-select.md | 1 +
2 files changed, 45 insertions(+)
diff --git a/docs/sql-ref-syntax-qry-select-groupby.md
b/docs/sql-ref-syntax-qry-select-groupby.md
index 49a11ca..c461a18 100644
--- a/docs/sql-ref-syntax-qry-select-groupby.md
+++ b/docs/sql-ref-syntax-qry-select-groupby.md
@@ -21,6 +21,7 @@ license: |
The <code>GROUP BY</code> clause is used to group the rows based on a set of
specified grouping expressions and compute aggregations on
the group of rows based on one or more specified aggregate functions. Spark
also supports advanced aggregations to do multiple
aggregations for the same input record set via `GROUPING SETS`, `CUBE`,
`ROLLUP` clauses.
+When a FILTER clause is attached to an aggregate function, only the matching
rows are passed to that function.
### Syntax
{% highlight sql %}
@@ -30,6 +31,11 @@ GROUP BY group_expression [ , group_expression [ , ... ] ]
GROUP BY GROUPING SETS (grouping_set [ , ...])
{% endhighlight %}
+While aggregate functions are defined as
+{% highlight sql %}
+aggregate_name ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE
boolean_expression ) ]
+{% endhighlight %}
+
### Parameters
<dl>
<dt><code><em>GROUPING SETS</em></code></dt>
@@ -70,6 +76,19 @@ GROUP BY GROUPING SETS (grouping_set [ , ...])
((warehouse, product), (warehouse), (product), ())</code>.
The N elements of a <code>CUBE</code> specification results in 2^N
<code>GROUPING SETS</code>.
</dd>
+ <dt><code><em>aggregate_name</em></code></dt>
+ <dd>
+ Specifies an aggregate function name (MIN, MAX, COUNT, SUM, AVG, etc.).
+ </dd>
+ <dt><code><em>DISTINCT</em></code></dt>
+ <dd>
+ Removes duplicates in input rows before they are passed to aggregate
functions.
+ </dd>
+ <dt><code><em>FILTER</em></code></dt>
+ <dd>
+ Filters the input rows for which the <code>boolean_expression</code> in
the <code>WHERE</code> clause evaluates
+ to true are passed to the aggregate function; other rows are discarded.
+ </dd>
</dl>
### Examples
@@ -120,6 +139,31 @@ SELECT id, sum(quantity) AS sum, max(quantity) AS max FROM
dealer GROUP BY id OR
|300|13 |8 |
+---+---+---+
+-- Count the number of distinct dealer cities per car_model.
+SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model;
+
+ +------------+-----+
+ | car_model|count|
+ +------------+-----+
+ | Honda Civic| 3|
+ | Honda CRV| 2|
+ |Honda Accord| 3|
+ +------------+-----+
+
+-- Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership.
+SELECT id, sum(quantity) FILTER (
+ WHERE car_model IN ('Honda Civic', 'Honda CRV')
+ ) AS `sum(quantity)` FROM dealer
+ GROUP BY id ORDER BY id;
+
+ +---+-------------+
+ | id|sum(quantity)|
+ +---+-------------+
+ |100| 17|
+ |200| 23|
+ |300| 5|
+ +---+-------------+
+
-- Aggregations using multiple sets of grouping columns in a single statement.
-- Following performs aggregations based on four sets of grouping columns.
-- 1. city, car_model
diff --git a/docs/sql-ref-syntax-qry-select.md
b/docs/sql-ref-syntax-qry-select.md
index e87c4a5..7ad1dd1 100644
--- a/docs/sql-ref-syntax-qry-select.md
+++ b/docs/sql-ref-syntax-qry-select.md
@@ -92,6 +92,7 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression
[ , ... ] }
<dd>
Specifies the expressions that are used to group the rows. This is used in
conjunction with aggregate functions
(MIN, MAX, COUNT, SUM, AVG, etc.) to group rows based on the grouping
expressions and aggregate values in each group.
+ When a FILTER clause is attached to an aggregate function, only the
matching rows are passed to that function.
</dd>
<dt><code><em>HAVING</em></code></dt>
<dd>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]