REPARTITION_BY_RANGE Hints to SQL Reference

srowen Sat, 30 May 2020 12:57:57 -0700

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new 5fa46eb  [SPARK-31866][SQL][DOCS] Add 
COALESCE/REPARTITION/REPARTITION_BY_RANGE Hints to SQL Reference
5fa46eb is described below

commit 5fa46eb3d50281943a446e6d10fc7c6621c011cd
Author: Huaxin Gao <[email protected]>
AuthorDate: Sat May 30 14:51:45 2020 -0500

    [SPARK-31866][SQL][DOCS] Add COALESCE/REPARTITION/REPARTITION_BY_RANGE 
Hints to SQL Reference
    
    Add Coalesce/Repartition/Repartition_By_Range Hints to SQL Reference
    
    To make SQL reference complete
    
    <img width="1100" alt="Screen Shot 2020-05-29 at 6 46 38 PM" 
src="https://user-images.githubusercontent.com/13592258/83316782-d6fcf300-a1dc-11ea-87f6-e357b9c739fd.png";>
    
    <img width="1099" alt="Screen Shot 2020-05-29 at 6 43 30 PM" 
src="https://user-images.githubusercontent.com/13592258/83316784-d8c6b680-a1dc-11ea-95ea-10a1f75dcef9.png";>
    
    Only the the above pages are changed. The following two pages are the same 
as before.
    
    <img width="1100" alt="Screen Shot 2020-05-28 at 10 05 27 PM" 
src="https://user-images.githubusercontent.com/13592258/83223474-bfb3fc00-a12f-11ea-807a-824a618afa0b.png";>
    
    <img width="1099" alt="Screen Shot 2020-05-28 at 10 05 08 PM" 
src="https://user-images.githubusercontent.com/13592258/83223478-c2165600-a12f-11ea-806e-a1e57dc35ef4.png";>
    
    Manually build and check
    
    Closes #28672 from huaxingao/coalesce_hint.
    
    Authored-by: Huaxin Gao <[email protected]>
    Signed-off-by: Sean Owen <[email protected]>
    (cherry picked from commit 1b780f364bfbb46944fe805a024bb6c32f5d2dde)
    Signed-off-by: Sean Owen <[email protected]>
---
 docs/_data/menu-sql.yaml                           |  8 +--
 docs/sql-performance-tuning.md                     |  4 ++
 docs/sql-ref-syntax-qry-select-hints.md            | 83 ++++++++++++++++++++--
 docs/sql-ref-syntax-qry-select-join.md             |  2 +-
 ...ng.md => sql-ref-syntax-qry-select-sampling.md} |  0
 ...ndow.md => sql-ref-syntax-qry-select-window.md} |  0
 docs/sql-ref-syntax-qry-select.md                  |  6 +-
 docs/sql-ref-syntax-qry.md                         |  6 +-
 docs/sql-ref-syntax.md                             |  6 +-
 9 files changed, 95 insertions(+), 20 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 289a9d3..219e680 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -171,22 +171,22 @@
                   url: sql-ref-syntax-qry-select-limit.html
                 - text: Common Table Expression
                   url: sql-ref-syntax-qry-select-cte.html
+                - text: Hints
+                  url: sql-ref-syntax-qry-select-hints.html
                 - text: Inline Table
                   url: sql-ref-syntax-qry-select-inline-table.html
                 - text: JOIN
                   url: sql-ref-syntax-qry-select-join.html
-                - text: Join Hints
-                  url: sql-ref-syntax-qry-select-hints.html
                 - text: LIKE Predicate
                   url: sql-ref-syntax-qry-select-like.html
                 - text: Set Operators
                   url: sql-ref-syntax-qry-select-setops.html
                 - text: TABLESAMPLE
-                  url: sql-ref-syntax-qry-sampling.html
+                  url: sql-ref-syntax-qry-select-sampling.html
                 - text: Table-valued Function
                   url: sql-ref-syntax-qry-select-tvf.html
                 - text: Window Function
-                  url: sql-ref-syntax-qry-window.html
+                  url: sql-ref-syntax-qry-select-window.html
             - text: EXPLAIN
               url: sql-ref-syntax-qry-explain.html
         - text: Auxiliary Statements
diff --git a/docs/sql-performance-tuning.md b/docs/sql-performance-tuning.md
index 5b784a5..5e6f049 100644
--- a/docs/sql-performance-tuning.md
+++ b/docs/sql-performance-tuning.md
@@ -179,6 +179,8 @@ SELECT /*+ BROADCAST(r) */ * FROM records r JOIN src s ON 
r.key = s.key
 </div>
 </div>
 
+For more details please refer to the documentation of [Join 
Hints](sql-ref-syntax-qry-select-hints.html#join-hints).
+
 ## Coalesce Hints for SQL Queries
 
 Coalesce hints allows the Spark SQL users to control the number of output 
files just like the
@@ -194,6 +196,8 @@ The "REPARTITION_BY_RANGE" hint must have column names and 
a partition number is
     SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t
     SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t
 
+For more details please refer to the documentation of [Partitioning 
Hints](sql-ref-syntax-qry-select-hints.html#partitioning-hints).
+
 ## Adaptive Query Execution
 Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that 
makes use of the runtime statistics to choose the most efficient query 
execution plan. AQE is disabled by default. Spark SQL can use the umbrella 
configuration of `spark.sql.adaptive.enabled` to control whether turn it 
on/off. As of Spark 3.0, there are three major features in AQE, including 
coalescing post-shuffle partitions, converting sort-merge join to broadcast 
join, and skew join optimization.
 
diff --git a/docs/sql-ref-syntax-qry-select-hints.md 
b/docs/sql-ref-syntax-qry-select-hints.md
index 4bb48b0..247ce48 100644
--- a/docs/sql-ref-syntax-qry-select-hints.md
+++ b/docs/sql-ref-syntax-qry-select-hints.md
@@ -1,7 +1,7 @@
 ---
 layout: global
-title: Join Hints
-displayTitle: Join Hints
+title: Hints
+displayTitle: Hints
 license: |
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
@@ -21,15 +21,86 @@ license: |
 
 ### Description
 
-Join Hints allow users to suggest the join strategy that Spark should use. 
Prior to Spark 3.0, only the `BROADCAST` Join Hint was supported. `MERGE`, 
`SHUFFLE_HASH` and `SHUFFLE_REPLICATE_NL` Joint Hints support was added in 3.0. 
When different join strategy hints are specified on both sides of a join, Spark 
prioritizes hints in the following order: `BROADCAST` over `MERGE` over 
`SHUFFLE_HASH` over `SHUFFLE_REPLICATE_NL`. When both sides are specified with 
the `BROADCAST` hint or the `SH [...]
+Hints give users a way to suggest how Spark SQL to use specific approaches to 
generate its execution plan.
 
 ### Syntax
 
 ```sql
-/*+ join_hint [ , ... ] */
+/*+ hint [ , ... ] */
 ```
 
-### Join Hints Types
+### Partitioning Hints
+
+Partitioning hints allow users to suggest a partitioning stragety that Spark 
should follow. `COALESCE`, `REPARTITION`,
+and `REPARTITION_BY_RANGE` hints are supported and are equivalent to 
`coalesce`, `repartition`, and
+`repartitionByRange` [Dataset 
APIs](api/scala/org/apache/spark/sql/Dataset.html), respectively. These hints 
give users
+a way to tune performance and control the number of output files in Spark SQL. 
When multiple partitioning hints are
+specified, multiple nodes are inserted into the logical plan, but the leftmost 
hint is picked by the optimizer.
+
+#### Partitioning Hints Types
+
+* **COALESCE**
+
+  The `COALESCE` hint can be used to reduce the number of partitions to the 
specified number of partitions. It takes a partition number as a parameter.
+
+* **REPARTITION**
+
+  The `REPARTITION` hint can be used to repartition to the specified number of 
partitions using the specified partitioning expressions. It takes a partition 
number, column names, or both as parameters.
+
+* **REPARTITION_BY_RANGE**
+
+  The `REPARTITION_BY_RANGE` hint can be used to repartition to the specified 
number of partitions using the specified partitioning expressions. It takes 
column names and an optional partition number as parameters.
+
+#### Examples
+
+```sql
+SELECT /*+ COALESCE(3) */ * FROM t;
+
+SELECT /*+ REPARTITION(3) */ * FROM t;
+
+SELECT /*+ REPARTITION(c) */ * FROM t;
+
+SELECT /*+ REPARTITION(3, c) */ * FROM t;
+
+SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t;
+
+SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t;
+
+-- multiple partitioning hints
+EXPLAIN EXTENDED SELECT /*+ REPARTITION(100), COALESCE(500), 
REPARTITION_BY_RANGE(3, c) */ * FROM t;
+== Parsed Logical Plan ==
+'UnresolvedHint REPARTITION, [100]
++- 'UnresolvedHint COALESCE, [500]
+   +- 'UnresolvedHint REPARTITION_BY_RANGE, [3, 'c]
+      +- 'Project [*]
+         +- 'UnresolvedRelation [t]
+
+== Analyzed Logical Plan ==
+name: string, c: int
+Repartition 100, true
++- Repartition 500, false
+   +- RepartitionByExpression [c#30 ASC NULLS FIRST], 3
+      +- Project [name#29, c#30]
+         +- SubqueryAlias spark_catalog.default.t
+            +- Relation[name#29,c#30] parquet
+
+== Optimized Logical Plan ==
+Repartition 100, true
++- Relation[name#29,c#30] parquet
+
+== Physical Plan ==
+Exchange RoundRobinPartitioning(100), false, [id=#121]
++- *(1) ColumnarToRow
+   +- FileScan parquet default.t[name#29,c#30] Batched: true, DataFilters: [], 
Format: Parquet,
+      Location: CatalogFileIndex[file:/spark/spark-warehouse/t], 
PartitionFilters: [],
+      PushedFilters: [], ReadSchema: struct<name:string>
+```
+
+### Join Hints
+
+Join hints allow users to suggest the join strategy that Spark should use. 
Prior to Spark 3.0, only the `BROADCAST` Join Hint was supported. `MERGE`, 
`SHUFFLE_HASH` and `SHUFFLE_REPLICATE_NL` Joint Hints support was added in 3.0. 
When different join strategy hints are specified on both sides of a join, Spark 
prioritizes hints in the following order: `BROADCAST` over `MERGE` over 
`SHUFFLE_HASH` over `SHUFFLE_REPLICATE_NL`. When both sides are specified with 
the `BROADCAST` hint or the `SH [...]
+
+#### Join Hints Types
 
 * **BROADCAST**
 
@@ -47,7 +118,7 @@ Join Hints allow users to suggest the join strategy that 
Spark should use. Prior
 
     Suggests that Spark use shuffle-and-replicate nested loop join.
 
-### Examples
+#### Examples
 
 ```sql
 -- Join Hints for broadcast join
diff --git a/docs/sql-ref-syntax-qry-select-join.md 
b/docs/sql-ref-syntax-qry-select-join.md
index 28b21f5..09b0efd 100644
--- a/docs/sql-ref-syntax-qry-select-join.md
+++ b/docs/sql-ref-syntax-qry-select-join.md
@@ -235,4 +235,4 @@ SELECT * FROM employee ANTI JOIN department ON 
employee.deptno = department.dept
 ### Related Statements
 
 * [SELECT](sql-ref-syntax-qry-select.html)
-* [Join Hints](sql-ref-syntax-qry-select-hints.html)
+* [Hints](sql-ref-syntax-qry-select-hints.html)
diff --git a/docs/sql-ref-syntax-qry-sampling.md 
b/docs/sql-ref-syntax-qry-select-sampling.md
similarity index 100%
rename from docs/sql-ref-syntax-qry-sampling.md
rename to docs/sql-ref-syntax-qry-select-sampling.md
diff --git a/docs/sql-ref-syntax-qry-window.md 
b/docs/sql-ref-syntax-qry-select-window.md
similarity index 100%
rename from docs/sql-ref-syntax-qry-window.md
rename to docs/sql-ref-syntax-qry-select-window.md
diff --git a/docs/sql-ref-syntax-qry-select.md 
b/docs/sql-ref-syntax-qry-select.md
index 1aeecdb..987e647 100644
--- a/docs/sql-ref-syntax-qry-select.md
+++ b/docs/sql-ref-syntax-qry-select.md
@@ -151,11 +151,11 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { 
named_expression [ , ... ] }
 * [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
 * [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
 * [Common Table Expression](sql-ref-syntax-qry-select-cte.html)
+* [Hints](sql-ref-syntax-qry-select-hints.html)
 * [Inline Table](sql-ref-syntax-qry-select-inline-table.html)
 * [JOIN](sql-ref-syntax-qry-select-join.html)
-* [Join Hints](sql-ref-syntax-qry-select-hints.html)
 * [LIKE Predicate](sql-ref-syntax-qry-select-like.html)
 * [Set Operators](sql-ref-syntax-qry-select-setops.html)
-* [TABLESAMPLE](sql-ref-syntax-qry-sampling.html)
+* [TABLESAMPLE](sql-ref-syntax-qry-select-sampling.html)
 * [Table-valued Function](sql-ref-syntax-qry-select-tvf.html)
-* [Window Function](sql-ref-syntax-qry-window.html)
+* [Window Function](sql-ref-syntax-qry-select-window.html)
diff --git a/docs/sql-ref-syntax-qry.md b/docs/sql-ref-syntax-qry.md
index 8accdfe..167c394 100644
--- a/docs/sql-ref-syntax-qry.md
+++ b/docs/sql-ref-syntax-qry.md
@@ -37,12 +37,12 @@ ability to generate logical and physical plan for a given 
query using
   * [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
   * [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
   * [Common Table Expression](sql-ref-syntax-qry-select-cte.html)
+  * [Hints](sql-ref-syntax-qry-select-hints.html)
   * [Inline Table](sql-ref-syntax-qry-select-inline-table.html)
   * [JOIN](sql-ref-syntax-qry-select-join.html)
-  * [Join Hints](sql-ref-syntax-qry-select-hints.html)
   * [LIKE Predicate](sql-ref-syntax-qry-select-like.html)
   * [Set Operators](sql-ref-syntax-qry-select-setops.html)
-  * [TABLESAMPLE](sql-ref-syntax-qry-sampling.html)
+  * [TABLESAMPLE](sql-ref-syntax-qry-select-sampling.html)
   * [Table-valued Function](sql-ref-syntax-qry-select-tvf.html)
-  * [Window Function](sql-ref-syntax-qry-window.html)
+  * [Window Function](sql-ref-syntax-qry-select-window.html)
 * [EXPLAIN Statement](sql-ref-syntax-qry-explain.html)
diff --git a/docs/sql-ref-syntax.md b/docs/sql-ref-syntax.md
index 98e3065..d78a01f 100644
--- a/docs/sql-ref-syntax.md
+++ b/docs/sql-ref-syntax.md
@@ -54,18 +54,18 @@ Spark SQL is Apache Spark's module for working with 
structured data. The SQL Syn
    * [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
    * [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
    * [HAVING Clause](sql-ref-syntax-qry-select-having.html)
+   * [Hints](sql-ref-syntax-qry-select-hints.html)
    * [Inline Table](sql-ref-syntax-qry-select-inline-table.html)
    * [JOIN](sql-ref-syntax-qry-select-join.html)
-   * [Join Hints](sql-ref-syntax-qry-select-hints.html)
    * [LIKE Predicate](sql-ref-syntax-qry-select-like.html)
    * [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
    * [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
    * [Set Operators](sql-ref-syntax-qry-select-setops.html)
    * [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
-   * [TABLESAMPLE](sql-ref-syntax-qry-sampling.html)
+   * [TABLESAMPLE](sql-ref-syntax-qry-select-sampling.html)
    * [Table-valued Function](sql-ref-syntax-qry-select-tvf.html)
    * [WHERE Clause](sql-ref-syntax-qry-select-where.html)
-   * [Window Function](sql-ref-syntax-qry-window.html)
+   * [Window Function](sql-ref-syntax-qry-select-window.html)
  * [EXPLAIN](sql-ref-syntax-qry-explain.html)
 
 ### Auxiliary Statements


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch branch-3.0 updated: [SPARK-31866][SQL][DOCS] Add COALESCE/REPARTITION/REPARTITION_BY_RANGE Hints to SQL Reference

Reply via email to