lsyldliu commented on code in PR #24975: URL: https://github.com/apache/flink/pull/24975#discussion_r1665348020
########## docs/content/docs/dev/table/materialized-table/overview.md: ########## @@ -0,0 +1,65 @@ +--- +title: Overview +weight: 1 +type: docs +aliases: +- /dev/table/materialized-table.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Introduction + +Materialized Table is a new table type introduced in Flink SQL, aimed at simplifying both batch and stream data pipelines, providing a consistent development experience. By specifying data freshness and query when creating Materialized Table, the engine automatically derives the schema for the materialized table and creates corresponding data refresh pipeline to achieve the specified freshness. + +{{< hint warning >}} +**Note**: This feature is currently an MVP (“minimum viable product”) feature and only available within [SQL Gateway]({{< ref "docs/dev/table/sql-gateway/overview" >}}) which connected to a [Standalone]({{< ref "docs/deployment/resource-providers/standalone/overview" >}}) deployed Flink cluster. +{{< /hint >}} + +# Core Concepts + +Materialized Table encompass the following core concepts: Data Freshness, Refresh Mode, Query Definition and Schema. + +## Data Freshness + +Data freshness defines the maximum amount of time that the materialized table’s content should lag behind updates to the base tables. Freshness is not a guarantee. Instead, it is a target that Flink attempts to meet. Data in materialized table is refreshed as closely as possible within the freshness. + +Data freshness is a crucial attribute of a materialized table, serving two main purposes: +- **Determining the Refresh Mode**. Currently, there are CONTINUOUS and FULL modes. For details on how to determine the refresh mode, refer to the [materialized-table.refresh-mode.freshness-threshold]({{< ref "docs/dev/table/config" >}}#materialized-table-refresh-mode-freshness-threshold) configuration item. + - CONTINUOUS mode: Launches a Flink streaming job that continuously refreshes the materialized table data. + - FULL mode: The workflow scheduler periodically triggers a Flink batch job to refresh the materialized table data. +- **Determining the Refresh Frequency**. + - In CONTINUOUS mode, data freshness is converted into the `checkpoint` interval of the Flink streaming job currently. + - In FULL mode, data freshness is converted into the scheduling cycle of the workflow, e.g. cron expression. + +## Refresh Mode + +There are two refresh modes: FULL and CONTINUOUS. By default, the refresh mode is inferred based on data freshness. Users can explicitly specify the refresh mode for specific business scenarios, which will take precedence over the data freshness inference. + +- **CONTINUOUS Mode**: The Flink streaming job incrementally updates the materialized table data, When downstream data is only visible after the checkpoint is completed, the data refresh frequency matches the job's checkpoint interval. Review Comment: ditto -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
