This is an automated email from the ASF dual-hosted git repository.

dbecker pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git


The following commit(s) were added to refs/heads/master by this push:
     new 70c35425d IMPALA-12774: [DOCS] Document ALTER TABLE SORT BY syntax
70c35425d is described below

commit 70c35425d3f8ac68b23fb8e0d08e12ee763965d7
Author: Noemi Pap-Takacs <[email protected]>
AuthorDate: Tue Feb 27 17:43:23 2024 +0100

    IMPALA-12774: [DOCS] Document ALTER TABLE SORT BY syntax
    
    Extended the ALTER TABLE documentation with the SORT BY clause.
    Also added more information about the available and the deafult
    sort orders to the CREATE TABLE description.
    
    Testing: Built docs locally.
    
    Change-Id: Ieb348d8395a6140f0be200d73e2f22fded9a5116
    Reviewed-on: http://gerrit.cloudera.org:8080/21083
    Tested-by: Impala Public Jenkins <[email protected]>
    Reviewed-by: Daniel Becker <[email protected]>
---
 docs/topics/impala_alter_table.xml  | 23 ++++++++++++++++++++++-
 docs/topics/impala_create_table.xml | 19 +++++++++++++------
 2 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/docs/topics/impala_alter_table.xml 
b/docs/topics/impala_alter_table.xml
index 3c64ce99d..1d7fc835c 100644
--- a/docs/topics/impala_alter_table.xml
+++ b/docs/topics/impala_alter_table.xml
@@ -511,6 +511,27 @@ ALTER TABLE <varname>table_name</varname> SET 
SERDEPROPERTIES ('<varname>key1</v
       clauses.
     </p>
 
+    <p>
+      <b>To specify a sort order for new records that are added to the 
table:</b>
+    </p>
+
+<codeblock>ALTER TABLE <varname>table_name</varname> SORT BY 
[LEXICAL|ZORDER](<varname>column_name1</varname>[, 
<varname>column_name2</varname> ...]);</codeblock>
+
+    <p>
+      Specifying the sort order is optional. The default sort order is 
<codeph>LEXICAL</codeph>.
+      Setting the <codeph>SORT BY</codeph> property will not rewrite existing 
data files,
+      but subsequent inserts will be ordered. Sorting is always ascending.
+    </p>
+
+    <p>
+      Use the <codeph>DESCRIBE FORMATTED</codeph> statement to see the current 
sort
+      properties ('<codeph>sort.columns</codeph>' and 
'<codeph>sort.order</codeph>')
+      for an existing table. They can also be set using <codeph>SET 
TBLPROPERTIES</codeph>.
+    </p>
+    <p>
+      For details about sort order see <xref 
href="impala_create_table.xml#create_table"/>.
+    </p>
+
     <p>
       <b>To manually set or update table or column statistics:</b>
     </p>
@@ -736,7 +757,7 @@ optional int32 x [i:1 d:1 r:0]
     <p>
       Use an <codeph>ALTER TABLE ... SET FILEFORMAT</codeph> clause. You can 
include an optional
       <codeph>PARTITION (<varname>col1</varname>=<varname>val1</varname>,
-      <varname>col2</varname>=<varname>val2</varname>, ...</codeph> clause so 
that the file
+      <varname>col2</varname>=<varname>val2</varname>, ...</codeph>) clause so 
that the file
       format is changed for a specific partition rather than the entire table.
     </p>
 
diff --git a/docs/topics/impala_create_table.xml 
b/docs/topics/impala_create_table.xml
index 50e3ad7e8..263895c40 100644
--- a/docs/topics/impala_create_table.xml
+++ b/docs/topics/impala_create_table.xml
@@ -450,11 +450,18 @@ AS
 
     <p rev="2.9.0 IMPALA-4166">
       The optional <codeph>SORT BY</codeph> clause lets you specify zero or 
more columns that
-      are sorted in the data files created by each Impala 
<codeph>INSERT</codeph> or
-      <codeph>CREATE TABLE AS SELECT</codeph> operation. Creating data files 
that are sorted is
-      most useful for Parquet tables, where the metadata stored inside each 
file includes the
-      minimum and maximum values for each column in the file. (The statistics 
apply to each row
-      group within the file; for simplicity, Impala writes a single row group 
in each file.)
+      are sorted in ascending order in the data files created by each Impala 
<codeph>INSERT</codeph>
+      or <codeph>CREATE TABLE AS SELECT</codeph> operation. There are two 
orderings to chose
+      from: <codeph>LEXICAL</codeph> and <codeph>ZORDER</codeph>. The default 
ordering is
+      <codeph>LEXICAL</codeph>, which can be used for any number of sort 
columns.
+      <codeph>ZORDER</codeph> can only be used to sort more than one column.
+    </p>
+
+    <p rev="2.9.0 IMPALA-4166">
+      Creating data files that are sorted is most useful for Parquet tables, 
where the
+      metadata stored inside each file includes the minimum and maximum values 
for each
+      column in the file. (The statistics apply to each row group within the 
file;
+      for simplicity, Impala writes a single row group in each file.)
       Grouping data values together in relatively narrow ranges within each 
data file makes it
       possible for Impala to quickly skip over data files that do not contain 
value ranges
       indicated in the <codeph>WHERE</codeph> clause of a query, and can 
improve the
@@ -496,7 +503,7 @@ AS
     </p>
 
 <codeblock rev="2.9.0 IMPALA-4166">CREATE TABLE census_data (last_name STRING, 
first_name STRING, state STRING, address STRING)
-  SORT BY (last_name, state)
+  SORT BY LEXICAL (last_name, state)
   STORED AS PARQUET;
 </codeblock>
 

Reply via email to