This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.0 by this push: new db6ba04 [SPARK-31753][SQL][DOCS][FOLLOW-UP] Add missing keywords in the SQL docs db6ba04 is described below commit db6ba049c43e2aa1521ed39c9f2b802ad04d111f Author: GuoPhilipse <46367746+guophili...@users.noreply.github.com> AuthorDate: Thu Oct 1 08:15:53 2020 +0900 [SPARK-31753][SQL][DOCS][FOLLOW-UP] Add missing keywords in the SQL docs ### What changes were proposed in this pull request? update sql-ref docs, the following key words will be added in this PR. CLUSTERED BY SORTED BY INTO num_buckets BUCKETS ### Why are the changes needed? let more users know the sql key words usage ### Does this PR introduce _any_ user-facing change? No ![image](https://user-images.githubusercontent.com/46367746/94428281-0a6b8080-01c3-11eb-9ff3-899f8da602ca.png) ![image](https://user-images.githubusercontent.com/46367746/94428285-0d667100-01c3-11eb-8a54-90e7641d917b.png) ![image](https://user-images.githubusercontent.com/46367746/94428288-0f303480-01c3-11eb-9e1d-023538aa6e2d.png) ### How was this patch tested? generate html test Closes #29883 from GuoPhilipse/add-sql-missing-keywords. Lead-authored-by: GuoPhilipse <46367746+guophili...@users.noreply.github.com> Co-authored-by: GuoPhilipse <guofei...@126.com> Signed-off-by: Takeshi Yamamuro <yamam...@apache.org> (cherry picked from commit 3bdbb5546d2517dda6f71613927cc1783c87f319) Signed-off-by: Takeshi Yamamuro <yamam...@apache.org> --- docs/sql-ref-syntax-ddl-create-table-datasource.md | 7 ++++- docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 32 ++++++++++++++++++++++ 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-ddl-create-table-datasource.md b/docs/sql-ref-syntax-ddl-create-table-datasource.md index d334447..ba0516a 100644 --- a/docs/sql-ref-syntax-ddl-create-table-datasource.md +++ b/docs/sql-ref-syntax-ddl-create-table-datasource.md @@ -67,7 +67,12 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI * **SORTED BY** - Determines the order in which the data is stored in buckets. Default is Ascending order. + Specifies an ordering of bucket columns. Optionally, one can use ASC for an ascending order or DESC for a descending order after any column names in the SORTED BY clause. + If not specified, ASC is assumed by default. + +* **INTO num_buckets BUCKETS** + + Specifies buckets numbers, which is used in `CLUSTERED BY` clause. * **LOCATION** diff --git a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md index 7bf847d..3a8c8d5 100644 --- a/docs/sql-ref-syntax-ddl-create-table-hiveformat.md +++ b/docs/sql-ref-syntax-ddl-create-table-hiveformat.md @@ -31,6 +31,9 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [ COMMENT table_comment ] [ PARTITIONED BY ( col_name2[:] col_type2 [ COMMENT col_comment2 ], ... ) | ( col_name1, col_name2, ... ) ] + [ CLUSTERED BY ( col_name1, col_name2, ...) + [ SORTED BY ( col_name1 [ ASC | DESC ], col_name2 [ ASC | DESC ], ... ) ] + INTO num_buckets BUCKETS ] [ ROW FORMAT row_format ] [ STORED AS file_format ] [ LOCATION path ] @@ -65,6 +68,21 @@ as any order. For example, you can write COMMENT table_comment after TBLPROPERTI Partitions are created on the table, based on the columns specified. +* **CLUSTERED BY** + + Partitions created on the table will be bucketed into fixed buckets based on the column specified for bucketing. + + **NOTE:** Bucketing is an optimization technique that uses buckets (and bucketing columns) to determine data partitioning and avoid data shuffle. + +* **SORTED BY** + + Specifies an ordering of bucket columns. Optionally, one can use ASC for an ascending order or DESC for a descending order after any column names in the SORTED BY clause. + If not specified, ASC is assumed by default. + +* **INTO num_buckets BUCKETS** + + Specifies buckets numbers, which is used in `CLUSTERED BY` clause. + * **row_format** Use the `SERDE` clause to specify a custom SerDe for one table. Otherwise, use the `DELIMITED` clause to use the native SerDe and specify the delimiter, escape character, null character and so on. @@ -203,6 +221,20 @@ CREATE EXTERNAL TABLE family (id INT, name STRING) STORED AS INPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleInputFormat' OUTPUTFORMAT 'com.ly.spark.example.serde.io.SerDeExampleOutputFormat' LOCATION '/tmp/family/'; + +--Use `CLUSTERED BY` clause to create bucket table without `SORTED BY` +CREATE TABLE clustered_by_test1 (ID INT, AGE STRING) + CLUSTERED BY (ID) + INTO 4 BUCKETS + STORED AS ORC + +--Use `CLUSTERED BY` clause to create bucket table with `SORTED BY` +CREATE TABLE clustered_by_test2 (ID INT, NAME STRING) + PARTITIONED BY (YEAR STRING) + CLUSTERED BY (ID, NAME) + SORTED BY (ID ASC) + INTO 3 BUCKETS + STORED AS PARQUET ``` ### Related Statements --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org