Re: [PR] [FLINK-37087][doc] Add docs for alter materialized table as query [flink]

via GitHub Wed, 05 Feb 2025 18:14:16 -0800


lsyldliu commented on code in PR #26064:
URL: https://github.com/apache/flink/pull/26064#discussion_r1943961810



##########
docs/content/docs/dev/table/materialized-table/statements.md:
##########
@@ -326,6 +328,69 @@ ALTER MATERIALIZED TABLE my_materialized_table REFRESH 
PARTITION (ds='2024-06-28
 <span class="label label-danger">Note</span>
 - The REFRESH operation will start a Flink batch job to refresh the 
materialized table data.
 
+## AS <select_statement>
+
+```sql
+ALTER MATERIALIZED TABLE [catalog_name.][db_name.]table_name AS 
<select_statement>
+```
+
+The `AS <select_statement>` clause allows you to modify the query definition 
for refreshing materialized table. It will first evolve the table's schema 
using the schema derived from the new query and then use the new query to 
refresh the table data. It is important to emphasize that, by default, this 
does not impact historical data.
+
+The modification process depends on the refresh mode of the materialized table:
+
+**Full mode:**
+
+1. Update the `schema` and `query definition` of the materialized table.
+2. The table is refreshed using the new query definition when the next refresh 
job is triggered:
+- If it is a partitioned table and [partition.fields.#.date-formatter]({{< ref 
"docs/dev/table/config" >}}#partition-fields-date-formatter) is correctly set, 
only the latest partition will be refreshed.

Review Comment:
   Please indent.
   
![image](https://github.com/user-attachments/assets/9fdca47d-897d-4869-bdaa-5e9ff31000b6)
   



##########
docs/content.zh/docs/dev/table/materialized-table/statements.md:
##########
@@ -326,6 +328,67 @@ ALTER MATERIALIZED TABLE my_materialized_table REFRESH 
PARTITION (ds='2024-06-28
 <span class="label label-danger">注意</span>
 - REFRESH 操作会启动批作业来刷新表的数据。
 
+## AS <select_statement>
+```sql
+ALTER MATERIALIZED TABLE [catalog_name.][db_name.]table_name AS 
<select_statement>
+```
+
+`AS <select_statement>` 子句用于修改刷新物化表的查询定义。它会先使用新查询派生的 `schema` 更新表的 
`schema`，然后使用新查询刷新表数据。需要特别强调的是，默认情况下，这不会影响历史数据。
+
+具体修改流程取决于物化表的刷新模式：
+
+**全量模式：**
+
+1. 更新物化表的 `schema` 和查询定义。
+2. 在刷新任务下次触发执行时，将使用新的查询定义刷新数据:
+- 如果修改的物化表是分区表，且[partition.fields.#.date-formatter]({{< ref 
"docs/dev/table/config" >}}#partition-fields-date-formatter) 配置正确，则仅刷新最新分区。
+- 否则，将刷新整个表的数据。
+
+**持续模式：**
+
+1. 暂停当前的实时刷新任务。

Review Comment:
   实时刷新任务 -> 流式刷新任务



##########
docs/content.zh/docs/dev/table/materialized-table/statements.md:
##########
@@ -326,6 +328,67 @@ ALTER MATERIALIZED TABLE my_materialized_table REFRESH 
PARTITION (ds='2024-06-28
 <span class="label label-danger">注意</span>
 - REFRESH 操作会启动批作业来刷新表的数据。
 
+## AS <select_statement>
+```sql
+ALTER MATERIALIZED TABLE [catalog_name.][db_name.]table_name AS 
<select_statement>
+```
+
+`AS <select_statement>` 子句用于修改刷新物化表的查询定义。它会先使用新查询派生的 `schema` 更新表的 
`schema`，然后使用新查询刷新表数据。需要特别强调的是，默认情况下，这不会影响历史数据。

Review Comment:
   The Chinese translate for `derive` should be `推导`, not `派生`



##########
docs/content/docs/dev/table/materialized-table/statements.md:
##########
@@ -326,6 +328,69 @@ ALTER MATERIALIZED TABLE my_materialized_table REFRESH 
PARTITION (ds='2024-06-28
 <span class="label label-danger">Note</span>
 - The REFRESH operation will start a Flink batch job to refresh the 
materialized table data.
 
+## AS <select_statement>
+
+```sql
+ALTER MATERIALIZED TABLE [catalog_name.][db_name.]table_name AS 
<select_statement>
+```
+
+The `AS <select_statement>` clause allows you to modify the query definition 
for refreshing materialized table. It will first evolve the table's schema 
using the schema derived from the new query and then use the new query to 
refresh the table data. It is important to emphasize that, by default, this 
does not impact historical data.
+
+The modification process depends on the refresh mode of the materialized table:
+
+**Full mode:**
+
+1. Update the `schema` and `query definition` of the materialized table.
+2. The table is refreshed using the new query definition when the next refresh 
job is triggered:
+- If it is a partitioned table and [partition.fields.#.date-formatter]({{< ref 
"docs/dev/table/config" >}}#partition-fields-date-formatter) is correctly set, 
only the latest partition will be refreshed.
+- Otherwise, the table will be overwritten entirely.
+
+**Continuous mode:**
+
+1. Pause the current running refresh job.
+2. Update the `schema` and `query definition` of the materialized table.
+3. Start a new refresh job to refresh the materialized table:
+- The new refresh job starts from the beginning and does not restore from the 
previous state.
+- The starting offset of the data source is determined by the connector’s 
default implementation or the `option hint` specified in the query.

Review Comment:
   `option hint` -> `dynamic options`, I think we can give a link that can help 
the user to understand it: 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#dynamic-table-options



##########
docs/content.zh/docs/dev/table/materialized-table/statements.md:
##########
@@ -326,6 +328,67 @@ ALTER MATERIALIZED TABLE my_materialized_table REFRESH 
PARTITION (ds='2024-06-28
 <span class="label label-danger">注意</span>
 - REFRESH 操作会启动批作业来刷新表的数据。
 
+## AS <select_statement>
+```sql
+ALTER MATERIALIZED TABLE [catalog_name.][db_name.]table_name AS 
<select_statement>
+```
+
+`AS <select_statement>` 子句用于修改刷新物化表的查询定义。它会先使用新查询派生的 `schema` 更新表的 
`schema`，然后使用新查询刷新表数据。需要特别强调的是，默认情况下，这不会影响历史数据。
+
+具体修改流程取决于物化表的刷新模式：
+
+**全量模式：**
+
+1. 更新物化表的 `schema` 和查询定义。
+2. 在刷新任务下次触发执行时，将使用新的查询定义刷新数据:
+- 如果修改的物化表是分区表，且[partition.fields.#.date-formatter]({{< ref 
"docs/dev/table/config" >}}#partition-fields-date-formatter) 配置正确，则仅刷新最新分区。
+- 否则，将刷新整个表的数据。
+
+**持续模式：**
+
+1. 暂停当前的实时刷新任务。
+2. 更新物化表的 `schema` 和查询定义。
+3. 启动新的流式任务以刷新物化表：
+- 新的流式任务会从头开始，而不会从之前的流式任务状态恢复。
+- 数据源的起始位点会由到连接器的默认实现或查询中设置的 `option hint` 决定。
+
+**示例：**
+
+```sql
+-- 原始物化表定义
+CREATE MATERIALIZED TABLE my_materialized_table
+    FRESHNESS = INTERVAL '10' SECOND
+    AS 
+    SELECT 
+        user_id,
+        COUNT(*) AS event_count,
+        SUM(amount) AS total_amount
+    FROM 
+        kafka_catalog.db1.events
+    WHERE 
+        event_type = 'purchase'
+    GROUP BY 
+        user_id;
+
+-- 修改现有物化表的查询
+ALTER MATERIALIZED TABLE my_materialized_table
+AS SELECT 
+    user_id,
+    COUNT(*) AS event_count,
+    SUM(amount) AS total_amount,
+    AVG(amount) AS avg_amount  -- 在末尾添加新的可为空列
+FROM
+    kafka_catalog.db1.events
+WHERE
+    event_type = 'purchase'
+GROUP BY
+    user_id;
+```
+
+<span class="label label-danger">注意</span>
+- Schema 演进当前仅支持在原表 schema 尾部添加`可空列`。

Review Comment:
   添加 -> 追加



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-37087][doc] Add docs for alter materialized table as query [flink]

Reply via email to