[GitHub] [flink-table-store] LadyForest commented on a change in pull request #66: [FLINK-26898] Introduce creating table document for table store

GitBox Tue, 29 Mar 2022 23:35:51 -0700


LadyForest commented on a change in pull request #66:
URL: https://github.com/apache/flink-table-store/pull/66#discussion_r837342493




##########
File path: docs/content/docs/development/create-table.md
##########
@@ -0,0 +1,153 @@
+---
+title: "Create Table"
+weight: 2
+type: docs
+aliases:
+- /development/create-table.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# CREATE statement
+
+```sql
+CREATE TABLE [IF NOT EXISTS] [catalog_name.][db_name.]table_name
+  (
+    { <physical_column_definition> | <computed_column_definition> }[ , ...n]
+    [ <watermark_definition> ]
+    [ <table_constraint> ][ , ...n]
+  )
+  [PARTITIONED BY (partition_column_name1, partition_column_name2, ...)]
+  WITH (key1=val1, key2=val2, ...)
+   
+<physical_column_definition>:
+  column_name column_type [ <column_constraint> ] [COMMENT column_comment]
+  
+<column_constraint>:
+  [CONSTRAINT constraint_name] PRIMARY KEY NOT ENFORCED
+
+<table_constraint>:
+  [CONSTRAINT constraint_name] PRIMARY KEY (column_name, ...) NOT ENFORCED
+
+<computed_column_definition>:
+  column_name AS computed_column_expression [COMMENT column_comment]
+
+<watermark_definition>:
+  WATERMARK FOR rowtime_column_name AS watermark_strategy_expression
+```
+
+{{< hint info >}}
+__Note:__ To ensure the uniqueness of the primary key, the
+primary key must contain the partition field.

Review comment:
       I think we'd better add a hint that doesn't use pk as partition key too

##########
File path: docs/content/docs/development/create-table.md
##########
@@ -0,0 +1,153 @@
+---
+title: "Create Table"
+weight: 2
+type: docs
+aliases:
+- /development/create-table.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# CREATE statement
+
+```sql
+CREATE TABLE [IF NOT EXISTS] [catalog_name.][db_name.]table_name
+  (
+    { <physical_column_definition> | <computed_column_definition> }[ , ...n]
+    [ <watermark_definition> ]
+    [ <table_constraint> ][ , ...n]
+  )
+  [PARTITIONED BY (partition_column_name1, partition_column_name2, ...)]
+  WITH (key1=val1, key2=val2, ...)
+   
+<physical_column_definition>:
+  column_name column_type [ <column_constraint> ] [COMMENT column_comment]
+  
+<column_constraint>:
+  [CONSTRAINT constraint_name] PRIMARY KEY NOT ENFORCED
+
+<table_constraint>:
+  [CONSTRAINT constraint_name] PRIMARY KEY (column_name, ...) NOT ENFORCED
+
+<computed_column_definition>:
+  column_name AS computed_column_expression [COMMENT column_comment]
+
+<watermark_definition>:
+  WATERMARK FOR rowtime_column_name AS watermark_strategy_expression
+```
+
+{{< hint info >}}
+__Note:__ To ensure the uniqueness of the primary key, the
+primary key must contain the partition field.
+{{< /hint >}}
+
+{{< hint info >}}
+__Note:__ Metadata column is not supported yet.
+{{< /hint >}}
+
+Table options does not contain the 'connector' key value

Review comment:
       `Table options do not contain`

##########
File path: docs/content/docs/development/distribution.md
##########
@@ -0,0 +1,104 @@
+---
+title: "Distribution"
+weight: 3
+type: docs
+aliases:
+- /development/distribution.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Distribution
+
+The data distribution of Table Store consists of three concepts:
+Partition, Bucket, and Primary Key.
+
+```sql
+CREATE TABLE MyTable (
+  user_id BIGINT,
+  item_id BIGINT,
+  behavior STRING,
+  dt STRING,
+  PRIMARY KEY (dt, user_id) NOT ENFORCED
+) PARTITION BY (dt) WITH (
+  'bucket' = '4'
+);
+```
+
+For example, the `MyTable` table above has its data distribution
+in the following order:
+- Partition: isolating different data based on partition fields.
+- Bucket: Within a single partition, distributed into 4 different
+  buckets based on the hash value of the primary key.
+- Primary key: Within a single bucket, sorted by primary key to
+  build LSM structure.
+
+## Partition
+
+Table Store has the same concept of partitioning as Apache Hive,
+which will separate the data and various operations can be managed
+by partition as a management unit.

Review comment:
       What about
   ```
   Table Store adopts the same partitioning concept as Apache Hive to separate 
data, and thus various operations can be managed by partition as a management 
unit.
   ```

##########
File path: docs/content/docs/development/distribution.md
##########
@@ -0,0 +1,104 @@
+---
+title: "Distribution"
+weight: 3
+type: docs
+aliases:
+- /development/distribution.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Distribution
+
+The data distribution of Table Store consists of three concepts:
+Partition, Bucket, and Primary Key.
+
+```sql
+CREATE TABLE MyTable (
+  user_id BIGINT,
+  item_id BIGINT,
+  behavior STRING,
+  dt STRING,
+  PRIMARY KEY (dt, user_id) NOT ENFORCED
+) PARTITION BY (dt) WITH (
+  'bucket' = '4'
+);
+```
+
+For example, the `MyTable` table above has its data distribution
+in the following order:
+- Partition: isolating different data based on partition fields.
+- Bucket: Within a single partition, distributed into 4 different
+  buckets based on the hash value of the primary key.
+- Primary key: Within a single bucket, sorted by primary key to
+  build LSM structure.
+
+## Partition
+
+Table Store has the same concept of partitioning as Apache Hive,
+which will separate the data and various operations can be managed
+by partition as a management unit.
+
+Partitioned filtering is the most effective way to improve performance,
+your query statements should contain partition filtering conditions.
+
+## Bucket
+
+The record is hashed into different buckets according to the
+primary key or the whole row (without primary key).
+
+The number of buckets is very important as it determines the
+worst-case maximum processing parallelism. But it should not
+be too big, otherwise it will create a lot of small files.
+
+In general, the desired file size is 128 MB, the recommended data
+to be kept on disk in each sub-bucket is about 1 GB.
+
+## Primary Key
+
+The primary key is unique, and the bucket will be sorted by the
+primary key. When no primary key is defined, data will be sorted
+by all fields. Using this ordered feature, you can achieve very
+high performance by filtering conditions on primary key.
+
+The setting of the primary key is very critical, especially the
+setting of the composite primary key, in which the more in front
+of the field the more effective the filtering is. For example:
+

Review comment:
       What about
   `The primary key's choice is critical, especially when setting the composite 
primary key. A rule of thumb is to put the most frequently queried field in the 
front.`

##########
File path: docs/content/docs/development/overview.md
##########
@@ -0,0 +1,114 @@
+---
+title: "Overview"
+weight: 1
+type: docs
+aliases:
+- /development/overview.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Overview
+
+Flink Table Store is a unified streaming and batch store for building dynamic
+tables on Apache Flink. Flink Table Store serves as the storage engine behind
+Flink SQL Managed Table.
+
+## Managed Table
+
+The typical usage of Flink SQL DDL is to specify the 'connector' and fill in
+the complex connection information in 'with'. The DDL just establishes an 
implicit
+relationship with the external system. We call such Table as external table.
+
+```sql
+CREATE TABLE KafkaTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+) WITH (
+  'connector' = 'kafka',
+  'topic' = 'user_behavior',
+  'properties.bootstrap.servers' = 'localhost:9092',
+  'properties.group.id' = 'testGroup',
+  'scan.startup.mode' = 'earliest-offset',
+  'format' = 'csv'
+);
+```
+
+The managed table is different, the connection information is already
+filled in the session environment, the user only needs to focus on the
+business logic when creating the DDL. The DDL is no longer just an
+implicit relationship; creating a table will create the corresponding
+physical storage, and dropping a table will delete the corresponding
+physical storage.
+
+```sql
+CREATE TABLE MyTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+);
+```
+
+## Unify Streaming and Batch
+
+Three types of connectors are included in Flink SQL.
+- Message queue, such as Apache Kafka, it is used in both source and 
+  intermediate stages in this pipeline, to guarantee the latency stay
+  within seconds.
+- OLAP system, such as Clickhouse, it receives processed data in
+  streaming fashion and serving user’s ad-hoc queries. 
+- Batch storage, such as Apache Hive, it supports various operations
+  of the traditional batch, including `INSERT OVERWRITE`.

Review comment:
       `traditional batch processing`

##########
File path: docs/content/docs/development/overview.md
##########
@@ -0,0 +1,114 @@
+---
+title: "Overview"
+weight: 1
+type: docs
+aliases:
+- /development/overview.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Overview
+
+Flink Table Store is a unified streaming and batch store for building dynamic
+tables on Apache Flink. Flink Table Store serves as the storage engine behind
+Flink SQL Managed Table.
+
+## Managed Table
+
+The typical usage of Flink SQL DDL is to specify the 'connector' and fill in
+the complex connection information in 'with'. The DDL just establishes an 
implicit
+relationship with the external system. We call such Table as external table.
+
+```sql
+CREATE TABLE KafkaTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+) WITH (
+  'connector' = 'kafka',
+  'topic' = 'user_behavior',
+  'properties.bootstrap.servers' = 'localhost:9092',
+  'properties.group.id' = 'testGroup',
+  'scan.startup.mode' = 'earliest-offset',
+  'format' = 'csv'
+);
+```
+
+The managed table is different, the connection information is already
+filled in the session environment, the user only needs to focus on the
+business logic when creating the DDL. The DDL is no longer just an

Review comment:
       Nit: `creating the DDL` sounds strange. Either be `writing the table 
creation DDL`, or just `writing DDL`

##########
File path: docs/content/docs/development/distribution.md
##########
@@ -0,0 +1,104 @@
+---
+title: "Distribution"
+weight: 3
+type: docs
+aliases:
+- /development/distribution.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Distribution
+
+The data distribution of Table Store consists of three concepts:
+Partition, Bucket, and Primary Key.
+
+```sql
+CREATE TABLE MyTable (
+  user_id BIGINT,
+  item_id BIGINT,
+  behavior STRING,
+  dt STRING,
+  PRIMARY KEY (dt, user_id) NOT ENFORCED
+) PARTITION BY (dt) WITH (
+  'bucket' = '4'
+);
+```
+
+For example, the `MyTable` table above has its data distribution
+in the following order:
+- Partition: isolating different data based on partition fields.
+- Bucket: Within a single partition, distributed into 4 different
+  buckets based on the hash value of the primary key.
+- Primary key: Within a single bucket, sorted by primary key to
+  build LSM structure.
+
+## Partition
+
+Table Store has the same concept of partitioning as Apache Hive,
+which will separate the data and various operations can be managed
+by partition as a management unit.
+
+Partitioned filtering is the most effective way to improve performance,
+your query statements should contain partition filtering conditions.

Review comment:
       `your query statements should contain partition filtering conditions.` 
sounds too commanding in tone. What about
   `Your query statements should contain partition filtering conditions as much 
as possible.`

##########
File path: docs/content/docs/development/create-table.md
##########
@@ -0,0 +1,153 @@
+---
+title: "Create Table"
+weight: 2
+type: docs
+aliases:
+- /development/create-table.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# CREATE statement
+
+```sql
+CREATE TABLE [IF NOT EXISTS] [catalog_name.][db_name.]table_name
+  (
+    { <physical_column_definition> | <computed_column_definition> }[ , ...n]
+    [ <watermark_definition> ]
+    [ <table_constraint> ][ , ...n]
+  )
+  [PARTITIONED BY (partition_column_name1, partition_column_name2, ...)]
+  WITH (key1=val1, key2=val2, ...)
+   
+<physical_column_definition>:
+  column_name column_type [ <column_constraint> ] [COMMENT column_comment]
+  
+<column_constraint>:
+  [CONSTRAINT constraint_name] PRIMARY KEY NOT ENFORCED
+
+<table_constraint>:
+  [CONSTRAINT constraint_name] PRIMARY KEY (column_name, ...) NOT ENFORCED
+
+<computed_column_definition>:
+  column_name AS computed_column_expression [COMMENT column_comment]
+
+<watermark_definition>:
+  WATERMARK FOR rowtime_column_name AS watermark_strategy_expression
+```
+
+{{< hint info >}}
+__Note:__ To ensure the uniqueness of the primary key, the
+primary key must contain the partition field.
+{{< /hint >}}
+
+{{< hint info >}}
+__Note:__ Metadata column is not supported yet.
+{{< /hint >}}
+
+Table options does not contain the 'connector' key value
+that is managed table. Creating a table will create the
+corresponding physical storage.
+
+When the corresponding physical storage already exists,
+such as a file directory or kafka topic:
+- If you want to reuse it, use `CREATE TABLE IF NOT EXISTS`
+- If you don't want to reuse it, `DROP TABLE IF EXISTS`
+  or delete it yourself.
+
+It is recommended that you use a persistent catalog, such as
+`HiveCatalog`, otherwise make sure you create the table with
+the same options each time.
+
+## Session Options
+
+To create a managed table, you need to set the required
+session options in advance. Session options are only
+valid when creating a table, not when reading or
+writing a table.
+
+You can set session options in the following two ways:
+- Edit `flink-conf.yaml`.
+- Via `TableEnvironment.getConfig().set`.
+
+The difference between session options and table options
+is that the session option needs to be prefixed with
+"table-store", for example, for the `bucket` option:
+- session: `SET 'table-store.bucket' = '4';`
+- table: `WITH ('bucket' = '4')`

Review comment:
       Nit: 
   - set  as session level: `SET 'table-store.bucket' = '4';`
   - set  as per table level: `CREATE TABLE ... WITH ('bucket' = '4')`

##########
File path: docs/content/docs/development/create-table.md
##########
@@ -0,0 +1,153 @@
+---
+title: "Create Table"
+weight: 2
+type: docs
+aliases:
+- /development/create-table.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# CREATE statement
+
+```sql
+CREATE TABLE [IF NOT EXISTS] [catalog_name.][db_name.]table_name
+  (
+    { <physical_column_definition> | <computed_column_definition> }[ , ...n]
+    [ <watermark_definition> ]
+    [ <table_constraint> ][ , ...n]
+  )
+  [PARTITIONED BY (partition_column_name1, partition_column_name2, ...)]
+  WITH (key1=val1, key2=val2, ...)
+   
+<physical_column_definition>:
+  column_name column_type [ <column_constraint> ] [COMMENT column_comment]
+  
+<column_constraint>:
+  [CONSTRAINT constraint_name] PRIMARY KEY NOT ENFORCED
+
+<table_constraint>:
+  [CONSTRAINT constraint_name] PRIMARY KEY (column_name, ...) NOT ENFORCED
+
+<computed_column_definition>:
+  column_name AS computed_column_expression [COMMENT column_comment]
+
+<watermark_definition>:
+  WATERMARK FOR rowtime_column_name AS watermark_strategy_expression
+```
+
+{{< hint info >}}
+__Note:__ To ensure the uniqueness of the primary key, the
+primary key must contain the partition field.
+{{< /hint >}}
+
+{{< hint info >}}
+__Note:__ Metadata column is not supported yet.
+{{< /hint >}}
+
+Table options does not contain the 'connector' key value
+that is managed table. Creating a table will create the
+corresponding physical storage.
+
+When the corresponding physical storage already exists,
+such as a file directory or kafka topic:
+- If you want to reuse it, use `CREATE TABLE IF NOT EXISTS`
+- If you don't want to reuse it, `DROP TABLE IF EXISTS`
+  or delete it yourself.
+
+It is recommended that you use a persistent catalog, such as
+`HiveCatalog`, otherwise make sure you create the table with
+the same options each time.
+
+## Session Options
+
+To create a managed table, you need to set the required
+session options in advance. Session options are only
+valid when creating a table, not when reading or
+writing a table.
+

Review comment:
       Nit: `Session options are only valid when creating a table, not 
interfering with reading or
   writing the table.`

##########
File path: docs/content/docs/development/create-table.md
##########
@@ -0,0 +1,153 @@
+---
+title: "Create Table"
+weight: 2
+type: docs
+aliases:
+- /development/create-table.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# CREATE statement
+
+```sql
+CREATE TABLE [IF NOT EXISTS] [catalog_name.][db_name.]table_name
+  (
+    { <physical_column_definition> | <computed_column_definition> }[ , ...n]
+    [ <watermark_definition> ]
+    [ <table_constraint> ][ , ...n]
+  )
+  [PARTITIONED BY (partition_column_name1, partition_column_name2, ...)]
+  WITH (key1=val1, key2=val2, ...)
+   
+<physical_column_definition>:
+  column_name column_type [ <column_constraint> ] [COMMENT column_comment]
+  
+<column_constraint>:
+  [CONSTRAINT constraint_name] PRIMARY KEY NOT ENFORCED
+
+<table_constraint>:
+  [CONSTRAINT constraint_name] PRIMARY KEY (column_name, ...) NOT ENFORCED
+
+<computed_column_definition>:
+  column_name AS computed_column_expression [COMMENT column_comment]
+
+<watermark_definition>:
+  WATERMARK FOR rowtime_column_name AS watermark_strategy_expression
+```
+
+{{< hint info >}}
+__Note:__ To ensure the uniqueness of the primary key, the
+primary key must contain the partition field.
+{{< /hint >}}
+
+{{< hint info >}}
+__Note:__ Metadata column is not supported yet.
+{{< /hint >}}
+
+Table options does not contain the 'connector' key value
+that is managed table. Creating a table will create the

Review comment:
       `Table options does not contain the 'connector' key value that is 
managed table.` maybe have grammar issue.
   
   * If you want to use an attributive clause, `that/which` should follow the 
antecedent it describes. See 
[wiki](https://zh.wikipedia.org/wiki/%E8%8B%B1%E8%AF%AD%E5%85%B3%E7%B3%BB%E4%BB%8E%E5%8F%A5
   )
   * If we ignore the attributive, the clause turns out to be  "Table options 
~~does not contain the 'connector' key value that~~ is managed table."  But 
`Table options` is not `table`, so `is` is inaccurate, it should be 
`represent/indicate`
   
   Suggested changes =>
   
   `Table options that do not contain the 'connector' key and value represent a 
managed table.`

##########
File path: docs/content/docs/development/create-table.md
##########
@@ -0,0 +1,153 @@
+---
+title: "Create Table"
+weight: 2
+type: docs
+aliases:
+- /development/create-table.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# CREATE statement
+
+```sql
+CREATE TABLE [IF NOT EXISTS] [catalog_name.][db_name.]table_name
+  (
+    { <physical_column_definition> | <computed_column_definition> }[ , ...n]
+    [ <watermark_definition> ]
+    [ <table_constraint> ][ , ...n]
+  )
+  [PARTITIONED BY (partition_column_name1, partition_column_name2, ...)]
+  WITH (key1=val1, key2=val2, ...)
+   
+<physical_column_definition>:
+  column_name column_type [ <column_constraint> ] [COMMENT column_comment]
+  
+<column_constraint>:
+  [CONSTRAINT constraint_name] PRIMARY KEY NOT ENFORCED
+
+<table_constraint>:
+  [CONSTRAINT constraint_name] PRIMARY KEY (column_name, ...) NOT ENFORCED
+
+<computed_column_definition>:
+  column_name AS computed_column_expression [COMMENT column_comment]
+
+<watermark_definition>:
+  WATERMARK FOR rowtime_column_name AS watermark_strategy_expression
+```
+
+{{< hint info >}}
+__Note:__ To ensure the uniqueness of the primary key, the
+primary key must contain the partition field.
+{{< /hint >}}
+
+{{< hint info >}}
+__Note:__ Metadata column is not supported yet.
+{{< /hint >}}
+
+Table options does not contain the 'connector' key value
+that is managed table. Creating a table will create the
+corresponding physical storage.
+
+When the corresponding physical storage already exists,
+such as a file directory or kafka topic:
+- If you want to reuse it, use `CREATE TABLE IF NOT EXISTS`
+- If you don't want to reuse it, `DROP TABLE IF EXISTS`
+  or delete it yourself.
+
+It is recommended that you use a persistent catalog, such as
+`HiveCatalog`, otherwise make sure you create the table with
+the same options each time.
+
+## Session Options
+
+To create a managed table, you need to set the required
+session options in advance. Session options are only
+valid when creating a table, not when reading or
+writing a table.
+
+You can set session options in the following two ways:
+- Edit `flink-conf.yaml`.
+- Via `TableEnvironment.getConfig().set`.
+
+The difference between session options and table options
+is that the session option needs to be prefixed with
+"table-store", for example, for the `bucket` option:

Review comment:
       Nit: The difference between session options and table options is that 
the session option needs to be prefixed with
   `table-store`. Take `bucket` option for example:

##########
File path: docs/content/docs/development/distribution.md
##########
@@ -0,0 +1,104 @@
+---
+title: "Distribution"
+weight: 3
+type: docs
+aliases:
+- /development/distribution.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Distribution
+
+The data distribution of Table Store consists of three concepts:
+Partition, Bucket, and Primary Key.
+
+```sql
+CREATE TABLE MyTable (
+  user_id BIGINT,
+  item_id BIGINT,
+  behavior STRING,
+  dt STRING,
+  PRIMARY KEY (dt, user_id) NOT ENFORCED
+) PARTITION BY (dt) WITH (
+  'bucket' = '4'
+);
+```
+
+For example, the `MyTable` table above has its data distribution
+in the following order:
+- Partition: isolating different data based on partition fields.
+- Bucket: Within a single partition, distributed into 4 different
+  buckets based on the hash value of the primary key.
+- Primary key: Within a single bucket, sorted by primary key to
+  build LSM structure.
+
+## Partition
+
+Table Store has the same concept of partitioning as Apache Hive,
+which will separate the data and various operations can be managed
+by partition as a management unit.
+
+Partitioned filtering is the most effective way to improve performance,
+your query statements should contain partition filtering conditions.
+
+## Bucket
+
+The record is hashed into different buckets according to the
+primary key or the whole row (without primary key).
+
+The number of buckets is very important as it determines the
+worst-case maximum processing parallelism. But it should not
+be too big, otherwise it will create a lot of small files.
+
+In general, the desired file size is 128 MB, the recommended data
+to be kept on disk in each sub-bucket is about 1 GB.
+
+## Primary Key
+
+The primary key is unique, and the bucket will be sorted by the
+primary key. When no primary key is defined, data will be sorted
+by all fields. Using this ordered feature, you can achieve very
+high performance by filtering conditions on primary key.
+

Review comment:
       What about
   `Flink Table Store imposes an ordering of data, which means the system will 
sort the primary key within each bucket and sort buckets by primary key within 
each partition. All fields will be used to sort if no primary key is defined. 
Using this feature, you can achieve high performance by adding filter 
conditions on the primary key.`
   
   some modification points
   * remove `very`, which sounds a little subjective 
   * re-paraphrase the sentence to make it more fluent

##########
File path: docs/content/docs/development/overview.md
##########
@@ -0,0 +1,114 @@
+---
+title: "Overview"
+weight: 1
+type: docs
+aliases:
+- /development/overview.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Overview
+
+Flink Table Store is a unified streaming and batch store for building dynamic
+tables on Apache Flink. Flink Table Store serves as the storage engine behind
+Flink SQL Managed Table.
+
+## Managed Table
+
+The typical usage of Flink SQL DDL is to specify the 'connector' and fill in
+the complex connection information in 'with'. The DDL just establishes an 
implicit
+relationship with the external system. We call such Table as external table.
+
+```sql
+CREATE TABLE KafkaTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+) WITH (
+  'connector' = 'kafka',
+  'topic' = 'user_behavior',
+  'properties.bootstrap.servers' = 'localhost:9092',
+  'properties.group.id' = 'testGroup',
+  'scan.startup.mode' = 'earliest-offset',
+  'format' = 'csv'
+);
+```

Review comment:
       Nit: add a comment
   ```sql
   -- an external table ddl
   CREATE TABLE KafkaTable (
     `user_id` BIGINT,
     `item_id` BIGINT,
     `behavior` STRING
   ) WITH (
     'connector' = 'kafka',
     'topic' = 'user_behavior',
     'properties.bootstrap.servers' = 'localhost:9092',
     'properties.group.id' = 'testGroup',
     'scan.startup.mode' = 'earliest-offset',
     'format' = 'csv'
   );
   ```

##########
File path: docs/content/docs/development/overview.md
##########
@@ -0,0 +1,114 @@
+---
+title: "Overview"
+weight: 1
+type: docs
+aliases:
+- /development/overview.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Overview
+
+Flink Table Store is a unified streaming and batch store for building dynamic
+tables on Apache Flink. Flink Table Store serves as the storage engine behind
+Flink SQL Managed Table.
+
+## Managed Table
+
+The typical usage of Flink SQL DDL is to specify the 'connector' and fill in
+the complex connection information in 'with'. The DDL just establishes an 
implicit
+relationship with the external system. We call such Table as external table.
+
+```sql
+CREATE TABLE KafkaTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+) WITH (
+  'connector' = 'kafka',
+  'topic' = 'user_behavior',
+  'properties.bootstrap.servers' = 'localhost:9092',
+  'properties.group.id' = 'testGroup',
+  'scan.startup.mode' = 'earliest-offset',
+  'format' = 'csv'
+);
+```
+
+The managed table is different, the connection information is already
+filled in the session environment, the user only needs to focus on the
+business logic when creating the DDL. The DDL is no longer just an
+implicit relationship; creating a table will create the corresponding
+physical storage, and dropping a table will delete the corresponding
+physical storage.
+
+```sql
+CREATE TABLE MyTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+);
+```
+
+## Unify Streaming and Batch
+
+Three types of connectors are included in Flink SQL.
+- Message queue, such as Apache Kafka, it is used in both source and 
+  intermediate stages in this pipeline, to guarantee the latency stay
+  within seconds.
+- OLAP system, such as Clickhouse, it receives processed data in
+  streaming fashion and serving user’s ad-hoc queries. 
+- Batch storage, such as Apache Hive, it supports various operations
+  of the traditional batch, including `INSERT OVERWRITE`.
+
+Flink Table Store provides table abstraction, you can use it as if
+it were a table in a database:
+- In Flink `batch` execution mode, it acts like a Hive table and
+  supports various operations of Batch SQL. Query it to see the
+  latest snapshot.
+- In Flink `streaming` execution mode, it acts like a message queue.
+  Query it to get its change log stream. It does not drop a record

Review comment:
       `Query it to get stream changelog`?

##########
File path: docs/content/docs/development/overview.md
##########
@@ -0,0 +1,114 @@
+---
+title: "Overview"
+weight: 1
+type: docs
+aliases:
+- /development/overview.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Overview
+
+Flink Table Store is a unified streaming and batch store for building dynamic
+tables on Apache Flink. Flink Table Store serves as the storage engine behind
+Flink SQL Managed Table.
+
+## Managed Table
+
+The typical usage of Flink SQL DDL is to specify the 'connector' and fill in
+the complex connection information in 'with'. The DDL just establishes an 
implicit
+relationship with the external system. We call such Table as external table.
+
+```sql
+CREATE TABLE KafkaTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+) WITH (
+  'connector' = 'kafka',
+  'topic' = 'user_behavior',
+  'properties.bootstrap.servers' = 'localhost:9092',
+  'properties.group.id' = 'testGroup',
+  'scan.startup.mode' = 'earliest-offset',
+  'format' = 'csv'
+);
+```
+
+The managed table is different, the connection information is already
+filled in the session environment, the user only needs to focus on the
+business logic when creating the DDL. The DDL is no longer just an
+implicit relationship; creating a table will create the corresponding
+physical storage, and dropping a table will delete the corresponding
+physical storage.
+
+```sql
+CREATE TABLE MyTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+);
+```
+
+## Unify Streaming and Batch
+
+Three types of connectors are included in Flink SQL.

Review comment:
       There are three types of connectors in Flink SQL.

##########
File path: docs/content/docs/development/overview.md
##########
@@ -0,0 +1,114 @@
+---
+title: "Overview"
+weight: 1
+type: docs
+aliases:
+- /development/overview.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Overview
+
+Flink Table Store is a unified streaming and batch store for building dynamic
+tables on Apache Flink. Flink Table Store serves as the storage engine behind
+Flink SQL Managed Table.
+
+## Managed Table
+
+The typical usage of Flink SQL DDL is to specify the 'connector' and fill in
+the complex connection information in 'with'. The DDL just establishes an 
implicit
+relationship with the external system. We call such Table as external table.
+
+```sql
+CREATE TABLE KafkaTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+) WITH (
+  'connector' = 'kafka',
+  'topic' = 'user_behavior',
+  'properties.bootstrap.servers' = 'localhost:9092',
+  'properties.group.id' = 'testGroup',
+  'scan.startup.mode' = 'earliest-offset',
+  'format' = 'csv'
+);
+```
+
+The managed table is different, the connection information is already
+filled in the session environment, the user only needs to focus on the
+business logic when creating the DDL. The DDL is no longer just an
+implicit relationship; creating a table will create the corresponding
+physical storage, and dropping a table will delete the corresponding
+physical storage.
+
+```sql
+CREATE TABLE MyTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+);
+```

Review comment:
       add a comment
   ```sql
   -- a managed table ddl
   CREATE TABLE MyTable (
     `user_id` BIGINT,
     `item_id` BIGINT,
     `behavior` STRING
   );
   ```

##########
File path: docs/content/docs/development/overview.md
##########
@@ -0,0 +1,114 @@
+---
+title: "Overview"
+weight: 1
+type: docs
+aliases:
+- /development/overview.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Overview
+
+Flink Table Store is a unified streaming and batch store for building dynamic
+tables on Apache Flink. Flink Table Store serves as the storage engine behind
+Flink SQL Managed Table.
+
+## Managed Table
+
+The typical usage of Flink SQL DDL is to specify the 'connector' and fill in
+the complex connection information in 'with'. The DDL just establishes an 
implicit
+relationship with the external system. We call such Table as external table.
+
+```sql
+CREATE TABLE KafkaTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+) WITH (
+  'connector' = 'kafka',
+  'topic' = 'user_behavior',
+  'properties.bootstrap.servers' = 'localhost:9092',
+  'properties.group.id' = 'testGroup',
+  'scan.startup.mode' = 'earliest-offset',
+  'format' = 'csv'
+);
+```
+
+The managed table is different, the connection information is already
+filled in the session environment, the user only needs to focus on the
+business logic when creating the DDL. The DDL is no longer just an
+implicit relationship; creating a table will create the corresponding
+physical storage, and dropping a table will delete the corresponding
+physical storage.
+
+```sql
+CREATE TABLE MyTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+);
+```
+
+## Unify Streaming and Batch
+
+Three types of connectors are included in Flink SQL.
+- Message queue, such as Apache Kafka, it is used in both source and 
+  intermediate stages in this pipeline, to guarantee the latency stay
+  within seconds.
+- OLAP system, such as Clickhouse, it receives processed data in
+  streaming fashion and serving user’s ad-hoc queries. 
+- Batch storage, such as Apache Hive, it supports various operations
+  of the traditional batch, including `INSERT OVERWRITE`.
+
+Flink Table Store provides table abstraction, you can use it as if
+it were a table in a database:
+- In Flink `batch` execution mode, it acts like a Hive table and
+  supports various operations of Batch SQL. Query it to see the
+  latest snapshot.
+- In Flink `streaming` execution mode, it acts like a message queue.
+  Query it to get its change log stream. It does not drop a record
+  because of TTL, and querying it by default will read the full amount
+  of data first, followed by logging the incremental data.

Review comment:
       Let's articulate more clearly for streaming mode.
   
   **Before**
   > In Flink `streaming` execution mode, it acts like a message queue.
     Query it to get its change log stream. It does not drop a record because 
of TTL, and querying it by default will read the full amount
     of data first, followed by logging the incremental data.
   
   **After**
   > Flink's `streaming` execution mode acts like querying a stream changelog 
from a message queue where historical data never expires.
   
   And the sentence should be followed by a table to describe different 
behavior in detail.

##########
File path: docs/content/docs/development/create-table.md
##########
@@ -0,0 +1,153 @@
+---
+title: "Create Table"
+weight: 2
+type: docs
+aliases:
+- /development/create-table.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# CREATE statement
+
+```sql
+CREATE TABLE [IF NOT EXISTS] [catalog_name.][db_name.]table_name
+  (
+    { <physical_column_definition> | <computed_column_definition> }[ , ...n]
+    [ <watermark_definition> ]
+    [ <table_constraint> ][ , ...n]
+  )
+  [PARTITIONED BY (partition_column_name1, partition_column_name2, ...)]
+  WITH (key1=val1, key2=val2, ...)
+   
+<physical_column_definition>:
+  column_name column_type [ <column_constraint> ] [COMMENT column_comment]
+  
+<column_constraint>:
+  [CONSTRAINT constraint_name] PRIMARY KEY NOT ENFORCED
+
+<table_constraint>:
+  [CONSTRAINT constraint_name] PRIMARY KEY (column_name, ...) NOT ENFORCED
+
+<computed_column_definition>:
+  column_name AS computed_column_expression [COMMENT column_comment]
+
+<watermark_definition>:
+  WATERMARK FOR rowtime_column_name AS watermark_strategy_expression
+```
+
+{{< hint info >}}
+__Note:__ To ensure the uniqueness of the primary key, the
+primary key must contain the partition field.
+{{< /hint >}}
+
+{{< hint info >}}
+__Note:__ Metadata column is not supported yet.
+{{< /hint >}}
+
+Table options does not contain the 'connector' key value
+that is managed table. Creating a table will create the
+corresponding physical storage.
+
+When the corresponding physical storage already exists,
+such as a file directory or kafka topic:
+- If you want to reuse it, use `CREATE TABLE IF NOT EXISTS`
+- If you don't want to reuse it, `DROP TABLE IF EXISTS`
+  or delete it yourself.
+
+It is recommended that you use a persistent catalog, such as
+`HiveCatalog`, otherwise make sure you create the table with
+the same options each time.
+
+## Session Options
+
+To create a managed table, you need to set the required
+session options in advance. Session options are only
+valid when creating a table, not when reading or
+writing a table.
+
+You can set session options in the following two ways:
+- Edit `flink-conf.yaml`.
+- Via `TableEnvironment.getConfig().set`.
+
+The difference between session options and table options
+is that the session option needs to be prefixed with
+"table-store", for example, for the `bucket` option:
+- session: `SET 'table-store.bucket' = '4';`
+- table: `WITH ('bucket' = '4')`
+
+Important options include the following:
+
+<table class="table table-bordered">
+    <thead>
+    <tr>
+      <th class="text-left" style="width: 20%">Option</th>
+      <th class="text-center" style="width: 5%">Required</th>
+      <th class="text-center" style="width: 5%">Default</th>
+      <th class="text-center" style="width: 10%">Type</th>
+      <th class="text-center" style="width: 60%">Description</th>
+    </tr>
+    </thead>
+    <tbody>
+    <tr>
+      <td><h5>file.path</h5></td>
+      <td>Yes</td>
+      <td style="word-wrap: break-word;">(none)</td>
+      <td>String</td>
+      <td>The root file path of the table store in the filesystem.</td>
+    </tr>
+    <tr>
+      <td><h5>bucket</h5></td>
+      <td>Yes</td>
+      <td style="word-wrap: break-word;">1</td>
+      <td>Integer</td>
+      <td>The bucket number for table store.</td>
+    </tr>
+    <tr>
+      <td><h5>log.system</h5></td>
+      <td>No</td>
+      <td style="word-wrap: break-word;">(none)</td>
+      <td>String</td>
+      <td>The log system used to keep changes of the table, supports 
'kafka'.</td>
+    </tr>
+    <tr>
+      <td><h5>log.kafka.bootstrap.servers</h5></td>
+      <td>No</td>
+      <td style="word-wrap: break-word;">(none)</td>
+      <td>String</td>
+      <td>Required Kafka server connection string for log store.</td>
+    </tr>
+    <tr>
+      <td><h5>log.retention</h5></td>
+      <td>No</td>
+      <td style="word-wrap: break-word;">(none)</td>
+      <td>Duration</td>
+      <td>It means how long changes log will be kept. The default value is 
from the log system cluster.</td>

Review comment:
       `It means how long changes log will be kept` sounds a little weird. We 
can utilize Kafka doc like `The duration to keep a log file before deleting it`

##########
File path: docs/content/docs/development/overview.md
##########
@@ -0,0 +1,114 @@
+---
+title: "Overview"
+weight: 1
+type: docs
+aliases:
+- /development/overview.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Overview
+
+Flink Table Store is a unified streaming and batch store for building dynamic
+tables on Apache Flink. Flink Table Store serves as the storage engine behind
+Flink SQL Managed Table.
+
+## Managed Table
+
+The typical usage of Flink SQL DDL is to specify the 'connector' and fill in
+the complex connection information in 'with'. The DDL just establishes an 
implicit
+relationship with the external system. We call such Table as external table.
+
+```sql
+CREATE TABLE KafkaTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+) WITH (
+  'connector' = 'kafka',
+  'topic' = 'user_behavior',
+  'properties.bootstrap.servers' = 'localhost:9092',
+  'properties.group.id' = 'testGroup',
+  'scan.startup.mode' = 'earliest-offset',
+  'format' = 'csv'
+);
+```
+
+The managed table is different, the connection information is already
+filled in the session environment, the user only needs to focus on the
+business logic when creating the DDL. The DDL is no longer just an
+implicit relationship; creating a table will create the corresponding
+physical storage, and dropping a table will delete the corresponding
+physical storage.
+
+```sql
+CREATE TABLE MyTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+);
+```
+
+## Unify Streaming and Batch
+
+Three types of connectors are included in Flink SQL.
+- Message queue, such as Apache Kafka, it is used in both source and 

Review comment:
       it is used as

##########
File path: docs/content/docs/development/overview.md
##########
@@ -0,0 +1,114 @@
+---
+title: "Overview"
+weight: 1
+type: docs
+aliases:
+- /development/overview.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Overview
+
+Flink Table Store is a unified streaming and batch store for building dynamic
+tables on Apache Flink. Flink Table Store serves as the storage engine behind
+Flink SQL Managed Table.
+
+## Managed Table
+
+The typical usage of Flink SQL DDL is to specify the 'connector' and fill in
+the complex connection information in 'with'. The DDL just establishes an 
implicit
+relationship with the external system. We call such Table as external table.
+
+```sql
+CREATE TABLE KafkaTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+) WITH (
+  'connector' = 'kafka',
+  'topic' = 'user_behavior',
+  'properties.bootstrap.servers' = 'localhost:9092',
+  'properties.group.id' = 'testGroup',
+  'scan.startup.mode' = 'earliest-offset',
+  'format' = 'csv'
+);
+```
+
+The managed table is different, the connection information is already
+filled in the session environment, the user only needs to focus on the
+business logic when creating the DDL. The DDL is no longer just an
+implicit relationship; creating a table will create the corresponding
+physical storage, and dropping a table will delete the corresponding
+physical storage.
+
+```sql
+CREATE TABLE MyTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+);
+```
+
+## Unify Streaming and Batch
+
+Three types of connectors are included in Flink SQL.
+- Message queue, such as Apache Kafka, it is used in both source and 
+  intermediate stages in this pipeline, to guarantee the latency stay
+  within seconds.
+- OLAP system, such as Clickhouse, it receives processed data in
+  streaming fashion and serving user’s ad-hoc queries. 
+- Batch storage, such as Apache Hive, it supports various operations
+  of the traditional batch, including `INSERT OVERWRITE`.
+
+Flink Table Store provides table abstraction, you can use it as if
+it were a table in a database:
+- In Flink `batch` execution mode, it acts like a Hive table and
+  supports various operations of Batch SQL. Query it to see the
+  latest snapshot.
+- In Flink `streaming` execution mode, it acts like a message queue.
+  Query it to get its change log stream. It does not drop a record
+  because of TTL, and querying it by default will read the full amount
+  of data first, followed by logging the incremental data.

Review comment:
       Different scan mode and log system configuration will result in 
different consuming behavior under streaming mode.
   <table class="table table-bordered">
       <thead>
       <tr>
         <th class="text-left" style="width: 20%">Scan Mode</th>
         <th class="text-center" style="width: 5%">Default</th>
         <th class="text-center" style="width: 60%">Description</th>
       </tr>
       </thead>
       <tbody>
       <tr>
         <td><h5>FULL</h5></td>
         <td>Yes</td>
         <td>When log system is enabled, FULL scan mode performs a hybrid 
reading with a bounded scan of file store and the unbounded scan of log store. 
When log system is disabled, FULL scan mode performs an unbounded scan of file 
store only.</td>
       </tr>
       <tr>
         <td><h5>LATEST</h5></td>
         <td>No</td>
         <td>When log system is enabled, LATEST scan mode only reads log system 
with an unbounded scan from the latest offset. When log system is disabled, 
LATEST scan mode will not scan any data.</td>
       </tr>
       <tr>
         <td><h5>FROM_TIMESTAMP</h5></td>
         <td>No</td>
         <td>When log system is enabled, FROM_TIMESTAMP scan mode only reads 
log system with an unbounded scan from the user-specified offset. When log 
system is disabled, FROM_TIMESTAMP scan mode will not scan any data.</td>
       </tr>
       </tbody>
   </table>

##########
File path: docs/content/docs/development/distribution.md
##########
@@ -0,0 +1,104 @@
+---
+title: "Distribution"
+weight: 3
+type: docs
+aliases:
+- /development/distribution.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Distribution
+
+The data distribution of Table Store consists of three concepts:
+Partition, Bucket, and Primary Key.
+
+```sql
+CREATE TABLE MyTable (
+  user_id BIGINT,
+  item_id BIGINT,
+  behavior STRING,
+  dt STRING,
+  PRIMARY KEY (dt, user_id) NOT ENFORCED
+) PARTITION BY (dt) WITH (
+  'bucket' = '4'
+);
+```
+
+For example, the `MyTable` table above has its data distribution
+in the following order:
+- Partition: isolating different data based on partition fields.
+- Bucket: Within a single partition, distributed into 4 different
+  buckets based on the hash value of the primary key.
+- Primary key: Within a single bucket, sorted by primary key to
+  build LSM structure.
+
+## Partition
+
+Table Store has the same concept of partitioning as Apache Hive,
+which will separate the data and various operations can be managed
+by partition as a management unit.
+
+Partitioned filtering is the most effective way to improve performance,
+your query statements should contain partition filtering conditions.
+
+## Bucket
+
+The record is hashed into different buckets according to the
+primary key or the whole row (without primary key).
+
+The number of buckets is very important as it determines the
+worst-case maximum processing parallelism. But it should not
+be too big, otherwise it will create a lot of small files.

Review comment:
       `But it should not be too big, otherwise it will create a lot of small 
files.`
   
   `it` is ambiguous here, because the first `it` stands for bucket num, the 
second `it` refers to the Flink Table Store. Users who don't have this prior 
knowledge will get confused. What about
    `But it should not be too big, otherwise, the system will create a lot of 
small files.`

##########
File path: docs/content/docs/development/overview.md
##########
@@ -0,0 +1,114 @@
+---
+title: "Overview"
+weight: 1
+type: docs
+aliases:
+- /development/overview.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Overview
+
+Flink Table Store is a unified streaming and batch store for building dynamic
+tables on Apache Flink. Flink Table Store serves as the storage engine behind
+Flink SQL Managed Table.
+
+## Managed Table
+
+The typical usage of Flink SQL DDL is to specify the 'connector' and fill in
+the complex connection information in 'with'. The DDL just establishes an 
implicit
+relationship with the external system. We call such Table as external table.
+
+```sql
+CREATE TABLE KafkaTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+) WITH (
+  'connector' = 'kafka',
+  'topic' = 'user_behavior',
+  'properties.bootstrap.servers' = 'localhost:9092',
+  'properties.group.id' = 'testGroup',
+  'scan.startup.mode' = 'earliest-offset',
+  'format' = 'csv'
+);
+```
+
+The managed table is different, the connection information is already
+filled in the session environment, the user only needs to focus on the
+business logic when creating the DDL. The DDL is no longer just an
+implicit relationship; creating a table will create the corresponding
+physical storage, and dropping a table will delete the corresponding
+physical storage.
+
+```sql
+CREATE TABLE MyTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+);
+```
+
+## Unify Streaming and Batch
+
+Three types of connectors are included in Flink SQL.
+- Message queue, such as Apache Kafka, it is used in both source and 
+  intermediate stages in this pipeline, to guarantee the latency stay
+  within seconds.
+- OLAP system, such as Clickhouse, it receives processed data in
+  streaming fashion and serving user’s ad-hoc queries. 
+- Batch storage, such as Apache Hive, it supports various operations
+  of the traditional batch, including `INSERT OVERWRITE`.
+
+Flink Table Store provides table abstraction, you can use it as if
+it were a table in a database:
+- In Flink `batch` execution mode, it acts like a Hive table and
+  supports various operations of Batch SQL. Query it to see the
+  latest snapshot.
+- In Flink `streaming` execution mode, it acts like a message queue.
+  Query it to get its change log stream. It does not drop a record
+  because of TTL, and querying it by default will read the full amount
+  of data first, followed by logging the incremental data.
+
+## Architecture
+
+Flink Table Store consist of two parts, LogStore and FileStore. The

Review comment:
       consists

##########
File path: docs/content/docs/development/overview.md
##########
@@ -0,0 +1,114 @@
+---
+title: "Overview"
+weight: 1
+type: docs
+aliases:
+- /development/overview.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Overview
+
+Flink Table Store is a unified streaming and batch store for building dynamic
+tables on Apache Flink. Flink Table Store serves as the storage engine behind
+Flink SQL Managed Table.
+
+## Managed Table
+
+The typical usage of Flink SQL DDL is to specify the 'connector' and fill in
+the complex connection information in 'with'. The DDL just establishes an 
implicit
+relationship with the external system. We call such Table as external table.
+
+```sql
+CREATE TABLE KafkaTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+) WITH (
+  'connector' = 'kafka',
+  'topic' = 'user_behavior',
+  'properties.bootstrap.servers' = 'localhost:9092',
+  'properties.group.id' = 'testGroup',
+  'scan.startup.mode' = 'earliest-offset',
+  'format' = 'csv'
+);
+```
+
+The managed table is different, the connection information is already
+filled in the session environment, the user only needs to focus on the
+business logic when creating the DDL. The DDL is no longer just an
+implicit relationship; creating a table will create the corresponding
+physical storage, and dropping a table will delete the corresponding
+physical storage.
+
+```sql
+CREATE TABLE MyTable (
+  `user_id` BIGINT,
+  `item_id` BIGINT,
+  `behavior` STRING
+);
+```
+
+## Unify Streaming and Batch
+
+Three types of connectors are included in Flink SQL.
+- Message queue, such as Apache Kafka, it is used in both source and 
+  intermediate stages in this pipeline, to guarantee the latency stay
+  within seconds.
+- OLAP system, such as Clickhouse, it receives processed data in
+  streaming fashion and serving user’s ad-hoc queries. 
+- Batch storage, such as Apache Hive, it supports various operations
+  of the traditional batch, including `INSERT OVERWRITE`.
+

Review comment:
       Formal tech writing should avoid using the subjunctive mood. See 
[wiki](https://en.wikipedia.org/wiki/Subjunctive_mood)
   > Subjunctive forms of [verbs](https://en.wikipedia.org/wiki/Verb) are 
typically used to express various states of unreality such as: wish, emotion, 
possibility, judgment, opinion, obligation, or action that has not yet occurred
   
   what about
   > Flink Table Store provides table abstraction. It is used in a way that 
does not differ from the traditional database.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #66: [FLINK-26898] Introduce creating table document for table store

Reply via email to