This is an automated email from the ASF dual-hosted git repository.
dbecker pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
The following commit(s) were added to refs/heads/master by this push:
new 70c35425d IMPALA-12774: [DOCS] Document ALTER TABLE SORT BY syntax
70c35425d is described below
commit 70c35425d3f8ac68b23fb8e0d08e12ee763965d7
Author: Noemi Pap-Takacs <[email protected]>
AuthorDate: Tue Feb 27 17:43:23 2024 +0100
IMPALA-12774: [DOCS] Document ALTER TABLE SORT BY syntax
Extended the ALTER TABLE documentation with the SORT BY clause.
Also added more information about the available and the deafult
sort orders to the CREATE TABLE description.
Testing: Built docs locally.
Change-Id: Ieb348d8395a6140f0be200d73e2f22fded9a5116
Reviewed-on: http://gerrit.cloudera.org:8080/21083
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Daniel Becker <[email protected]>
---
docs/topics/impala_alter_table.xml | 23 ++++++++++++++++++++++-
docs/topics/impala_create_table.xml | 19 +++++++++++++------
2 files changed, 35 insertions(+), 7 deletions(-)
diff --git a/docs/topics/impala_alter_table.xml
b/docs/topics/impala_alter_table.xml
index 3c64ce99d..1d7fc835c 100644
--- a/docs/topics/impala_alter_table.xml
+++ b/docs/topics/impala_alter_table.xml
@@ -511,6 +511,27 @@ ALTER TABLE <varname>table_name</varname> SET
SERDEPROPERTIES ('<varname>key1</v
clauses.
</p>
+ <p>
+ <b>To specify a sort order for new records that are added to the
table:</b>
+ </p>
+
+<codeblock>ALTER TABLE <varname>table_name</varname> SORT BY
[LEXICAL|ZORDER](<varname>column_name1</varname>[,
<varname>column_name2</varname> ...]);</codeblock>
+
+ <p>
+ Specifying the sort order is optional. The default sort order is
<codeph>LEXICAL</codeph>.
+ Setting the <codeph>SORT BY</codeph> property will not rewrite existing
data files,
+ but subsequent inserts will be ordered. Sorting is always ascending.
+ </p>
+
+ <p>
+ Use the <codeph>DESCRIBE FORMATTED</codeph> statement to see the current
sort
+ properties ('<codeph>sort.columns</codeph>' and
'<codeph>sort.order</codeph>')
+ for an existing table. They can also be set using <codeph>SET
TBLPROPERTIES</codeph>.
+ </p>
+ <p>
+ For details about sort order see <xref
href="impala_create_table.xml#create_table"/>.
+ </p>
+
<p>
<b>To manually set or update table or column statistics:</b>
</p>
@@ -736,7 +757,7 @@ optional int32 x [i:1 d:1 r:0]
<p>
Use an <codeph>ALTER TABLE ... SET FILEFORMAT</codeph> clause. You can
include an optional
<codeph>PARTITION (<varname>col1</varname>=<varname>val1</varname>,
- <varname>col2</varname>=<varname>val2</varname>, ...</codeph> clause so
that the file
+ <varname>col2</varname>=<varname>val2</varname>, ...</codeph>) clause so
that the file
format is changed for a specific partition rather than the entire table.
</p>
diff --git a/docs/topics/impala_create_table.xml
b/docs/topics/impala_create_table.xml
index 50e3ad7e8..263895c40 100644
--- a/docs/topics/impala_create_table.xml
+++ b/docs/topics/impala_create_table.xml
@@ -450,11 +450,18 @@ AS
<p rev="2.9.0 IMPALA-4166">
The optional <codeph>SORT BY</codeph> clause lets you specify zero or
more columns that
- are sorted in the data files created by each Impala
<codeph>INSERT</codeph> or
- <codeph>CREATE TABLE AS SELECT</codeph> operation. Creating data files
that are sorted is
- most useful for Parquet tables, where the metadata stored inside each
file includes the
- minimum and maximum values for each column in the file. (The statistics
apply to each row
- group within the file; for simplicity, Impala writes a single row group
in each file.)
+ are sorted in ascending order in the data files created by each Impala
<codeph>INSERT</codeph>
+ or <codeph>CREATE TABLE AS SELECT</codeph> operation. There are two
orderings to chose
+ from: <codeph>LEXICAL</codeph> and <codeph>ZORDER</codeph>. The default
ordering is
+ <codeph>LEXICAL</codeph>, which can be used for any number of sort
columns.
+ <codeph>ZORDER</codeph> can only be used to sort more than one column.
+ </p>
+
+ <p rev="2.9.0 IMPALA-4166">
+ Creating data files that are sorted is most useful for Parquet tables,
where the
+ metadata stored inside each file includes the minimum and maximum values
for each
+ column in the file. (The statistics apply to each row group within the
file;
+ for simplicity, Impala writes a single row group in each file.)
Grouping data values together in relatively narrow ranges within each
data file makes it
possible for Impala to quickly skip over data files that do not contain
value ranges
indicated in the <codeph>WHERE</codeph> clause of a query, and can
improve the
@@ -496,7 +503,7 @@ AS
</p>
<codeblock rev="2.9.0 IMPALA-4166">CREATE TABLE census_data (last_name STRING,
first_name STRING, state STRING, address STRING)
- SORT BY (last_name, state)
+ SORT BY LEXICAL (last_name, state)
STORED AS PARQUET;
</codeblock>