(flink) branch master updated: [FLINK-35790][docs] Update docs for new schema definition in CTAS and RTAS

twalthr Tue, 27 Aug 2024 07:31:34 -0700

This is an automated email from the ASF dual-hosted git repository.

twalthr pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git



The following commit(s) were added to refs/heads/master by this push:
     new 8c116def0b5 [FLINK-35790][docs] Update docs for new schema definition 
in CTAS and RTAS
8c116def0b5 is described below

commit 8c116def0b514236da1e95ef20e81dc364aa2b71
Author: Sergio Peña <[email protected]>
AuthorDate: Tue Aug 27 09:31:19 2024 -0500

    [FLINK-35790][docs] Update docs for new schema definition in CTAS and RTAS
---
 docs/content.zh/docs/dev/table/sql/create.md | 111 +++++++++++++++++++++++++--
 docs/content/docs/dev/table/sql/create.md    | 107 ++++++++++++++++++++++++--
 2 files changed, 206 insertions(+), 12 deletions(-)

diff --git a/docs/content.zh/docs/dev/table/sql/create.md 
b/docs/content.zh/docs/dev/table/sql/create.md
index 7d8074eaee7..717c0050230 100644
--- a/docs/content.zh/docs/dev/table/sql/create.md
+++ b/docs/content.zh/docs/dev/table/sql/create.md
@@ -585,12 +585,71 @@ CREATE TABLE my_ctas_table (
 INSERT INTO my_ctas_table SELECT id, name, age FROM source_table WHERE mod(id, 
10) = 0;
 ```
 
+The `CREATE` part allows you to specify explicit columns. The resulting table 
schema will contain the columns defined in the `CREATE` part first followed by 
the columns from the `SELECT` part. Columns named in both parts, in the 
`CREATE` and `SELECT` parts, keep the same column position as defined in the 
`SELECT` part. The data type of `SELECT` columns can also be overridden if 
specified in the `CREATE` part.
+
+Consider the example statement below:
+
+```sql
+CREATE TABLE my_ctas_table (
+    desc STRING,
+    quantity DOUBLE,   
+    cost AS price * quantity,
+    WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND,
+) WITH (
+    'connector' = 'kafka',
+    ...
+) AS SELECT id, price, quantity, order_time FROM source_table;
+```
+
+The resulting table `my_ctas_table` will be equivalent to create the following 
table and insert the data with the following statement:
+
+```
+CREATE TABLE my_ctas_table (
+    desc STRING,
+    cost AS price * quantity,
+    id BIGINT,
+    price DOUBLE,
+    quantity DOUBLE,
+    order_time TIMESTAMP(3),
+    WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND
+) WITH (
+    'connector' = 'kafka',
+    ...
+);
+
+INSERT INTO my_ctas_table (id, price, quantity, order_time)
+    SELECT id, price, quantity, order_time FROM source_table;
+```
+
+The `CREATE` part also lets you specify primary keys and distribution 
strategies. Notice that primary keys work only on `NOT NULL` columns. 
Currently, primary keys only allow you to define columns from the `SELECT` part 
which may be `NOT NULL`. The `CREATE` part does not allow `NOT NULL` column 
definitions.
+
+Consider the example statement below where `id` is a not null column in the 
`SELECT` part:
+
+```sql
+CREATE TABLE my_ctas_table (
+    PRIMARY KEY (id) NOT ENFORCED
+) DISTRIBUTED BY (id) INTO 4 buckets 
+AS SELECT id, name FROM source_table;
+```
+
+The resulting table `my_ctas_table` will be equivalent to create the following 
table and insert the data with the following statement:
+
+```
+CREATE TABLE my_ctas_table (
+    id BIGINT NOT NULL PRIMARY KEY NOT ENFORCED,
+    name STRING 
+) DISTRIBUTED BY (id) INTO 4 buckets;
+
+INSERT INTO my_ctas_table SELECT id, name FROM source_table;
+```
+
+**Note:** CTAS has these restrictions:
+* Does not support creating a temporary table yet.
+* Does not support creating partitioned table yet.
+
 **注意：** CTAS 有如下约束：
 * 暂不支持创建临时表。
-* 暂不支持指定列信息。
-* 暂不支持指定 Watermark。
 * 暂不支持创建分区表。
-* 暂不支持主键约束。
 
 **注意：** 默认情况下，CTAS 是非原子性的，这意味着如果在向表中插入数据时发生错误，该表不会被自动删除。
 
@@ -605,7 +664,13 @@ INSERT INTO my_ctas_table SELECT id, name, age FROM 
source_table WHERE mod(id, 1
 ## [CREATE OR] REPLACE TABLE
 ```sql
 [CREATE OR] REPLACE TABLE [catalog_name.][db_name.]table_name
+  [(
+    { <physical_column_definition> | <metadata_column_definition> | 
<computed_column_definition> }[ , ...n]
+    [ <watermark_definition> ]
+    [ <table_constraint> ][ , ...n]
+  )]
 [COMMENT table_comment]
+[ <distribution> ]
 WITH (key1=val1, key2=val2, ...)
 AS select_query
 ```
@@ -645,12 +710,46 @@ CREATE TABLE my_rtas_table (
 INSERT INTO my_rtas_table SELECT id, name, age FROM source_table WHERE mod(id, 
10) = 0;
 ```
 
+Similar to `CREATE TABLE AS`, `REPLACE TABLE AS` allows you to specify 
explicit columns, watermarks, primary keys and distribution strategies. The 
resulting table schema is built from the `CREATE` part first followed by the 
columns from the `SELECT` part. Columns named in both parts, in the `CREATE` 
and `SELECT` parts, keep the same column position as defined in the `SELECT` 
part. The data type of `SELECT` columns can also be overridden if specified in 
the `CREATE` part.
+
+Consider the example statement below:
+
+```sql
+REPLACE TABLE my_rtas_table (
+    desc STRING,
+    quantity DOUBLE,   
+    cost AS price * quantity,
+    WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND,
+    PRIMARY KEY (id) NOT ENFORCED
+) DISTRIBUTED BY (id) INTO 4 buckets
+AS SELECT id, price, quantity, order_time FROM source_table;
+```
+
+The resulting table `my_rtas_table` will be equivalent to create the following 
table and insert the data with the following statement:
+
+```sql
+DROP TABLE my_rtas_table;
+
+CREATE TABLE my_rtas_table (
+    desc STRING,
+    cost AS price * quantity,
+    id BIGINT NOT NULL PRIMARY KEY NOT ENFORCED,
+    price DOUBLE,
+    quantity DOUBLE,
+    order_time TIMESTAMP(3),
+    WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND
+) WITH (
+    'connector' = 'kafka',
+    ...
+);
+
+INSERT INTO my_rtas_table (id, price, quantity, order_time)
+    SELECT id, price, quantity, order_time FROM source_table;
+```
+
 **注意：** RTAS 有如下约束：
 * 暂不支持替换临时表。
-* 暂不支持指定列信息。
-* 暂不支持指定 Watermark。
 * 暂不支持创建分区表。
-* 暂不支持主键约束。
 
 **注意：** 默认情况下，RTAS 是非原子性的，这意味着如果在向表中插入数据时发生错误，该表不会被自动删除或还原成原来的表。
 **注意：** RTAS 会先删除表，然后创建表并写入数据。但如果表是在基于内存的 Catalog 里，删除表只会将其从 Catalog 
里移除，并不会移除物理表中的数据。因此，执行RTAS语句之前的数据仍然存在。
diff --git a/docs/content/docs/dev/table/sql/create.md 
b/docs/content/docs/dev/table/sql/create.md
index c03f67878de..2957a656790 100644
--- a/docs/content/docs/dev/table/sql/create.md
+++ b/docs/content/docs/dev/table/sql/create.md
@@ -585,12 +585,67 @@ CREATE TABLE my_ctas_table (
 INSERT INTO my_ctas_table SELECT id, name, age FROM source_table WHERE mod(id, 
10) = 0;
 ```
 
+The `CREATE` part allows you to specify explicit columns. The resulting table 
schema will contain the columns defined in the `CREATE` part first followed by 
the columns from the `SELECT` part. Columns named in both parts, in the 
`CREATE` and `SELECT` parts, keep the same column position as defined in the 
`SELECT` part. The data type of `SELECT` columns can also be overridden if 
specified in the `CREATE` part.
+
+Consider the example statement below:
+
+```sql
+CREATE TABLE my_ctas_table (
+    desc STRING,
+    quantity DOUBLE,   
+    cost AS price * quantity,
+    WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND,
+) WITH (
+    'connector' = 'kafka',
+    ...
+) AS SELECT id, price, quantity, order_time FROM source_table;
+```
+
+The resulting table `my_ctas_table` will be equivalent to create the following 
table and insert the data with the following statement:
+
+```
+CREATE TABLE my_ctas_table (
+    desc STRING,
+    cost AS price * quantity,
+    id BIGINT,
+    price DOUBLE,
+    quantity DOUBLE,
+    order_time TIMESTAMP(3),
+    WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND
+) WITH (
+    'connector' = 'kafka',
+    ...
+);
+
+INSERT INTO my_ctas_table (id, price, quantity, order_time)
+    SELECT id, price, quantity, order_time FROM source_table;
+```
+
+The `CREATE` part also lets you specify primary keys and distribution 
strategies. Notice that primary keys work only on `NOT NULL` columns. 
Currently, primary keys only allow you to define columns from the `SELECT` part 
which may be `NOT NULL`. The `CREATE` part does not allow `NOT NULL` column 
definitions.
+
+Consider the example statement below where `id` is a not null column in the 
`SELECT` part:
+
+```sql
+CREATE TABLE my_ctas_table (
+    PRIMARY KEY (id) NOT ENFORCED
+) DISTRIBUTED BY (id) INTO 4 buckets 
+AS SELECT id, name FROM source_table;
+```
+
+The resulting table `my_ctas_table` will be equivalent to create the following 
table and insert the data with the following statement:
+
+```
+CREATE TABLE my_ctas_table (
+    id BIGINT NOT NULL PRIMARY KEY NOT ENFORCED,
+    name STRING 
+) DISTRIBUTED BY (id) INTO 4 buckets;
+
+INSERT INTO my_ctas_table SELECT id, name FROM source_table;
+```
+
 **Note:** CTAS has these restrictions:
 * Does not support creating a temporary table yet.
-* Does not support specifying explicit columns yet.
-* Does not support specifying explicit watermark yet.
 * Does not support creating partitioned table yet.
-* Does not support specifying primary key constraints yet.
 
 **Note:** By default, CTAS is non-atomic which means the table created won't 
be dropped automatically if occur errors while inserting data into the table.
 
@@ -605,7 +660,13 @@ If you want to enable atomicity for CTAS, then you should 
make sure:
 ## [CREATE OR] REPLACE TABLE
 ```sql
 [CREATE OR] REPLACE TABLE [catalog_name.][db_name.]table_name
+  [(
+    { <physical_column_definition> | <metadata_column_definition> | 
<computed_column_definition> }[ , ...n]
+    [ <watermark_definition> ]
+    [ <table_constraint> ][ , ...n]
+  )]
 [COMMENT table_comment]
+[ <distribution> ]
 WITH (key1=val1, key2=val2, ...)
 AS select_query
 ```
@@ -645,13 +706,47 @@ CREATE TABLE my_rtas_table (
 INSERT INTO my_rtas_table SELECT id, name, age FROM source_table WHERE mod(id, 
10) = 0;
 ```
 
+Similar to `CREATE TABLE AS`, `REPLACE TABLE AS` allows you to specify 
explicit columns, watermarks, primary keys and distribution strategies. The 
resulting table schema is built from the `CREATE` part first followed by the 
columns from the `SELECT` part. Columns named in both parts, in the `CREATE` 
and `SELECT` parts, keep the same column position as defined in the `SELECT` 
part. The data type of `SELECT` columns can also be overridden if specified in 
the `CREATE` part.
+
+Consider the example statement below:
+
+```sql
+REPLACE TABLE my_rtas_table (
+    desc STRING,
+    quantity DOUBLE,   
+    cost AS price * quantity,
+    WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND,
+    PRIMARY KEY (id) NOT ENFORCED
+) DISTRIBUTED BY (id) INTO 4 buckets
+AS SELECT id, price, quantity, order_time FROM source_table;
+```
+
+The resulting table `my_rtas_table` will be equivalent to create the following 
table and insert the data with the following statement:
+
+```sql
+DROP TABLE my_rtas_table;
+
+CREATE TABLE my_rtas_table (
+    desc STRING,
+    cost AS price * quantity,
+    id BIGINT NOT NULL PRIMARY KEY NOT ENFORCED,
+    price DOUBLE,
+    quantity DOUBLE,
+    order_time TIMESTAMP(3),
+    WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND
+) WITH (
+    'connector' = 'kafka',
+    ...
+);
+
+INSERT INTO my_rtas_table (id, price, quantity, order_time)
+    SELECT id, price, quantity, order_time FROM source_table;
+```
+
 **Note:** RTAS has these restrictions:
 
 * Does not support replacing a temporary table yet.
-* Does not support specifying explicit columns yet.
-* Does not support specifying explicit watermark yet.
 * Does not support creating partitioned table yet.
-* Does not support specifying primary key constraints yet.
 
 **Note:** By default, RTAS is non-atomic which means the table won't be 
dropped or restored to its origin automatically if occur errors while inserting 
data into the table.
 **Note:** RTAS will drop the table first, then create the table and insert the 
data. But if the table is in the in-memory catalog, dropping table will only 
remove it from the catalog without removing the data in the physical table. So, 
the data before executing RTAS statement will still exist.

(flink) branch master updated: [FLINK-35790][docs] Update docs for new schema definition in CTAS and RTAS

Reply via email to