LadyForest commented on a change in pull request #66: URL: https://github.com/apache/flink-table-store/pull/66#discussion_r838164316
########## File path: docs/content/docs/development/overview.md ########## @@ -0,0 +1,114 @@ +--- +title: "Overview" +weight: 1 +type: docs +aliases: +- /development/overview.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Overview + +Flink Table Store is a unified streaming and batch store for building dynamic +tables on Apache Flink. Flink Table Store serves as the storage engine behind +Flink SQL Managed Table. + +## Managed Table + +The typical usage of Flink SQL DDL is to specify the 'connector' and fill in +the complex connection information in 'with'. The DDL just establishes an implicit +relationship with the external system. We call such Table as external table. + +```sql +CREATE TABLE KafkaTable ( + `user_id` BIGINT, + `item_id` BIGINT, + `behavior` STRING +) WITH ( + 'connector' = 'kafka', + 'topic' = 'user_behavior', + 'properties.bootstrap.servers' = 'localhost:9092', + 'properties.group.id' = 'testGroup', + 'scan.startup.mode' = 'earliest-offset', + 'format' = 'csv' +); +``` + +The managed table is different, the connection information is already +filled in the session environment, the user only needs to focus on the +business logic when creating the DDL. The DDL is no longer just an +implicit relationship; creating a table will create the corresponding +physical storage, and dropping a table will delete the corresponding +physical storage. + +```sql +CREATE TABLE MyTable ( + `user_id` BIGINT, + `item_id` BIGINT, + `behavior` STRING +); +``` + +## Unify Streaming and Batch + +Three types of connectors are included in Flink SQL. +- Message queue, such as Apache Kafka, it is used in both source and + intermediate stages in this pipeline, to guarantee the latency stay + within seconds. +- OLAP system, such as Clickhouse, it receives processed data in + streaming fashion and serving user’s ad-hoc queries. +- Batch storage, such as Apache Hive, it supports various operations + of the traditional batch, including `INSERT OVERWRITE`. + +Flink Table Store provides table abstraction, you can use it as if +it were a table in a database: +- In Flink `batch` execution mode, it acts like a Hive table and + supports various operations of Batch SQL. Query it to see the + latest snapshot. +- In Flink `streaming` execution mode, it acts like a message queue. + Query it to get its change log stream. It does not drop a record + because of TTL, and querying it by default will read the full amount + of data first, followed by logging the incremental data. Review comment: Different scan mode and log system configuration will result in different consuming behavior under streaming mode. <table class="table table-bordered"> <thead> <tr> <th class="text-left" style="width: 20%">Scan Mode</th> <th class="text-center" style="width: 5%">Default</th> <th class="text-center" style="width: 60%">Description</th> </tr> </thead> <tbody> <tr> <td><h5>FULL</h5></td> <td>Yes</td> <td>When log system is enabled, FULL scan mode performs a hybrid reading with a bounded scan of file store and the unbounded scan of log store. When log system is disabled, FULL scan mode performs an unbounded scan of file store only.</td> </tr> <tr> <td><h5>LATEST</h5></td> <td>No</td> <td>When log system is enabled, LATEST scan mode only reads log store with an unbounded scan from the latest offset. When log system is disabled, LATEST scan mode only reads file store from latest snapshot with an unbounded scan.</td> </tr> <tr> <td><h5>FROM_TIMESTAMP</h5></td> <td>No</td> <td>When log system is enabled, FROM_TIMESTAMP scan mode only reads log system with an unbounded scan from the user-specified offset. When log system is disabled, FROM_TIMESTAMP scan mode will not scan any data.</td> </tr> </tbody> </table> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
