[spark] branch master updated: [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references

yamamuro Mon, 06 Apr 2020 05:38:21 -0700

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new e24f0dc  [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate 
functions in SQL references
e24f0dc is described below

commit e24f0dcd2754a1db27e9b0f3cf27ee6d7229f717
Author: Takeshi Yamamuro <yamam...@apache.org>
AuthorDate: Mon Apr 6 21:36:51 2020 +0900

    [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in 
SQL references
    
    ### What changes were proposed in this pull request?
    
    This PR intends to improve the SQL document of `GROUP BY`; it added the 
description about FILTER clauses of aggregate functions.
    
    ### Why are the changes needed?
    
    To improve the SQL documents
    
    ### Does this PR introduce any user-facing change?
    
    Yes.
    
    <img 
src="https://user-images.githubusercontent.com/692303/78558612-e2234a80-784d-11ea-9353-b3feac4d57a7.png";
 width="500">
    
    ### How was this patch tested?
    
    Manually checked.
    
    Closes #28134 from maropu/SPARK-31358.
    
    Authored-by: Takeshi Yamamuro <yamam...@apache.org>
    Signed-off-by: Takeshi Yamamuro <yamam...@apache.org>
---
 docs/sql-ref-syntax-qry-select-groupby.md | 44 +++++++++++++++++++++++++++++++
 docs/sql-ref-syntax-qry-select.md         |  1 +
 2 files changed, 45 insertions(+)

diff --git a/docs/sql-ref-syntax-qry-select-groupby.md 
b/docs/sql-ref-syntax-qry-select-groupby.md
index 49a11ca..c461a18 100644
--- a/docs/sql-ref-syntax-qry-select-groupby.md
+++ b/docs/sql-ref-syntax-qry-select-groupby.md
@@ -21,6 +21,7 @@ license: |
 The <code>GROUP BY</code> clause is used to group the rows based on a set of 
specified grouping expressions and compute aggregations on
 the group of rows based on one or more specified aggregate functions. Spark 
also supports advanced aggregations to do multiple
 aggregations for the same input record set via `GROUPING SETS`, `CUBE`, 
`ROLLUP` clauses.
+When a FILTER clause is attached to an aggregate function, only the matching 
rows are passed to that function.
 
 ### Syntax
 {% highlight sql %}
@@ -30,6 +31,11 @@ GROUP BY group_expression [ , group_expression [ , ... ] ]
 GROUP BY GROUPING SETS (grouping_set [ , ...])
 {% endhighlight %}
 
+While aggregate functions are defined as
+{% highlight sql %}
+aggregate_name ( [ DISTINCT ] expression [ , ... ] ) [ FILTER ( WHERE 
boolean_expression ) ]
+{% endhighlight %}
+
 ### Parameters
 <dl>
   <dt><code><em>GROUPING SETS</em></code></dt>
@@ -70,6 +76,19 @@ GROUP BY GROUPING SETS (grouping_set [ , ...])
     ((warehouse, product), (warehouse), (product), ())</code>.
     The N elements of a <code>CUBE</code> specification results in 2^N 
<code>GROUPING SETS</code>.
   </dd>
+  <dt><code><em>aggregate_name</em></code></dt>
+  <dd>
+    Specifies an aggregate function name (MIN, MAX, COUNT, SUM, AVG, etc.).
+  </dd>
+  <dt><code><em>DISTINCT</em></code></dt>
+  <dd>
+    Removes duplicates in input rows before they are passed to aggregate 
functions.
+  </dd>
+  <dt><code><em>FILTER</em></code></dt>
+  <dd>
+    Filters the input rows for which the <code>boolean_expression</code> in 
the <code>WHERE</code> clause evaluates
+    to true are passed to the aggregate function; other rows are discarded.
+  </dd>
 </dl>
 
 ### Examples
@@ -120,6 +139,31 @@ SELECT id, sum(quantity) AS sum, max(quantity) AS max FROM 
dealer GROUP BY id OR
   |300|13 |8  |
   +---+---+---+
 
+-- Count the number of distinct dealer cities per car_model.
+SELECT car_model, count(DISTINCT city) AS count FROM dealer GROUP BY car_model;
+
+  +------------+-----+
+  |   car_model|count|
+  +------------+-----+
+  | Honda Civic|    3|
+  |   Honda CRV|    2|
+  |Honda Accord|    3|
+  +------------+-----+
+
+-- Sum of only 'Honda Civic' and 'Honda CRV' quantities per dealership.
+SELECT id, sum(quantity) FILTER (
+            WHERE car_model IN ('Honda Civic', 'Honda CRV')
+        ) AS `sum(quantity)` FROM dealer
+    GROUP BY id ORDER BY id;
+
+   +---+-------------+
+   | id|sum(quantity)|
+   +---+-------------+
+   |100|           17|
+   |200|           23|
+   |300|            5|
+   +---+-------------+
+
 -- Aggregations using multiple sets of grouping columns in a single statement.
 -- Following performs aggregations based on four sets of grouping columns.
 -- 1. city, car_model
diff --git a/docs/sql-ref-syntax-qry-select.md 
b/docs/sql-ref-syntax-qry-select.md
index e87c4a5..7ad1dd1 100644
--- a/docs/sql-ref-syntax-qry-select.md
+++ b/docs/sql-ref-syntax-qry-select.md
@@ -92,6 +92,7 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression 
[ , ... ] }
   <dd>
     Specifies the expressions that are used to group the rows. This is used in 
conjunction with aggregate functions
     (MIN, MAX, COUNT, SUM, AVG, etc.) to group rows based on the grouping 
expressions and aggregate values in each group.
+    When a FILTER clause is attached to an aggregate function, only the 
matching rows are passed to that function.
   </dd>
   <dt><code><em>HAVING</em></code></dt>
   <dd>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-31358][SQL][DOC] Document FILTER clauses of aggregate functions in SQL references

Reply via email to