Myasuka commented on code in PR #24975: URL: https://github.com/apache/flink/pull/24975#discussion_r1663472146
########## docs/content/docs/dev/table/materialized-table/statements.md: ########## @@ -0,0 +1,344 @@ +--- +title: Statements +weight: 2 +type: docs +aliases: +- /dev/table/materialized-table/statements.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Description + +Flink SQL supports the following Materialized Table statements for now: +- [CREATE MATERIALIZED TABLE](#create-materialized-table) +- [Alter MATERIALIZED TABLE](#alter-materialized-table) +- [DROP MATERIALIZED TABLE](#drop-materialized-table) + +# CREATE MATERIALIZED TABLE + +``` +CREATE MATERIALIZED TABLE [catalog_name.][db_name.]table_name + +[ ([ <table_constraint> ]) ] + +[COMMENT table_comment] + +[PARTITIONED BY (partition_column_name1, partition_column_name2, ...)] + +[WITH (key1=val1, key2=val2, ...)] + +FRESHNESS = INTERVAL '<num>' { SECOND | MINUTE | HOUR | DAY } + +[REFRESH_MODE = { CONTINUOUS | FULL }] + +AS <select_statement> + +<table_constraint>: + [CONSTRAINT constraint_name] PRIMARY KEY (column_name, ...) NOT ENFORCED +``` + +### PRIMARY KEY + +PRIMARY KEY defines an optional list of columns that uniquely identifies each row within the table. The column as the primary key must be non-null. + +### PARTITIONED BY + +PARTITIONED BY define an optional list of columns to partition the materialized table. A directory is created for each partition if this materialized table is used as a filesystem sink. + +**Example:** + +```sql +-- Create a materialized table and specify the partition field as `ds`. +CREATE MATERIALIZED TABLE my_materialized_table + PARTITIONED BY (ds) + FRESHNESS = INTERVAL '1' HOUR + AS SELECT + ds + FROM + ... +``` + +<span class="label label-danger">Note</span> +- The partition column must be included in the query statement of the materialized table. + +### WITH Options + +WITH Options are used to specify the materialized table properties, including [connector options]({{< ref "docs/connectors/table/" >}}) and [time format option]({{< ref "docs/dev/table/config" >}}#partition-fields-date-formatter) for partition fields. + +```sql +-- Create a materialized table, specify the partition field as 'ds', and the corresponding time format as 'yyyy-MM-dd' +CREATE MATERIALIZED TABLE my_materialized_table + PARTITIONED BY (ds) + WITH ( + 'format' = 'json', + 'partition.fields.ds.date-formatter' = 'yyyy-MM-dd' + ) + ... +``` + +As shown in the above example, we specified the date-formatter option for the ds partition column. During each scheduling, the scheduling time will be converted to the ds partition value. For example, for a scheduling time of 2024-01-01 00:00:00, only the partition ds = '2024-01-01' will be refreshed. + +<span class="label label-danger">Note</span> +- The `partition.fields.#.date-formatter` option only works in full mode. +- The field in the [partition.fields.#.date-formatter]({{< ref "docs/dev/table/config" >}}#partition-fields-date-formatter) must be a valid string type partition field. + +### FRESHNESS + +**FRESHNESS Definition and Refresh Mode Relationship** + +FRESHNESS defines the maximum amount of time that the materialized table’s content should lag behind updates to the base tables. It does two things, firstly it determines the [refresh mode]({{< ref "docs/dev/table/materialized-table/overview" >}}#refresh-mode) of the materialized table through [configuration]({{< ref "docs/dev/table/config" >}}#materialized-table-refresh-mode-freshness-threshold), followed by determines the data refresh frequency to meet the actual data freshness requirements. + +**Detailed Explanation of FRESHNESS Parameter** + +The FRESHNESS parameter range is INTERVAL `'<num>'` { SECOND | MINUTE | HOUR | DAY }. `'<num>'` must be a positive integer, and in FULL mode, `'<num>'` should be a common divisor of the respective time interval. + +**Examples:** +(Assuming `materialized-table.refresh-mode.freshness-threshold` is 30 minutes) + +```sql +-- The corresponding refresh pipeline is a streaming job with a checkpoint interval of 1 second +FRESHNESS = INTERVAL '1' SECOND Review Comment: Since current checkpoint interval is bounded to the settings of `freshness`, I think we should warn users that the stronger freshness would introduce more impact to the checkpoint, we can tune the freshness longer and tell users to consider [changelog state-backend](https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/state_backends/#enabling-changelog). ########## docs/content/docs/dev/table/materialized-table/overview.md: ########## @@ -0,0 +1,65 @@ +--- +title: Overview +weight: 1 +type: docs +aliases: +- /dev/table/materialized-table.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Introduction + +Materialized Table is a new table type introduced in Flink SQL, aimed at simplifying both batch and stream data pipelines, providing a consistent development experience. By specifying data freshness and query when creating Materialized, the engine automatically derives the schema for the materialized table and creates corresponding data refresh pipeline to achieve the specified freshness. + +{{< hint warning >}} +Note: This feature is currently an MVP (“minimum viable product”) feature and only available to [SQL Gateway]({{< ref "docs/dev/table/sql-gateway/overview" >}}) and [Standalone]({{< ref "docs/deployment/resource-providers/standalone/overview" >}}) cluster. Review Comment: I think this statement might mistake users, I think we could improve to ```suggestion Note: This feature is currently an MVP (“minimum viable product”) feature and only available within [SQL Gateway]({{< ref "docs/dev/table/sql-gateway/overview" >}}) which connected to a [Standalone]({{< ref "docs/deployment/resource-providers/standalone/overview" >}}) deployed Flink cluster. ``` Same for the Chinese part. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
