[GitHub] [flink-table-store] tsreaper commented on a diff in pull request #562: [FLINK-31248] Improve documentation for append-only table

via GitHub Tue, 28 Feb 2023 23:31:37 -0800


tsreaper commented on code in PR #562:
URL: https://github.com/apache/flink-table-store/pull/562#discussion_r1121244930



##########
docs/content/docs/concepts/primary-key-table.md:
##########
@@ -158,12 +154,28 @@ Full compaction changelog producer can produce complete 
changelog for any type o
 
 {{< /hint >}}
 
-## Changelog Tables without Primary Keys
+## Sequence Field
 
-Changelog tables can also be used without primary keys. Users can only insert 
or delete a whole record from the table. No update is supported.
+By default, the primary key table determines the merge order according to the 
input order. However, in distributed computing,
+there will be some cases that lead to data disorder. At this time, you can use 
a time field as `sequence.field`, for example:
 
-## Append-only Tables
+{{< tabs "sequence.field" >}}
+
+{{< tab "Flink" >}}
 
-By specifying `'write-mode' = 'append-only'` when creating the table, user 
creates an append-only table.
+```sql
+CREATE TABLE MyTable (
+    pk BIGINT PRIMARY KEY NOT ENFORCED,
+    v1 DOUBLE,
+    v2 BIGINT,
+    dt TIMESTAMP
+) WITH (
+    'sequence.field' = 'dt'
+);
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
 
-You can only insert a whole record into the table. No delete or update is 
supported and you cannot define primary keys. This type of table is suitable 
for use cases that do not require updates (such as log data synchronization).
+Regardless of the order of data input, the final correct result will always be 
obtained.

Review Comment:
   ```suggestion
   The record with the largest `sequence.field` value will be the last to 
merge, regardless of the input order.
   ```



##########
docs/content/docs/concepts/primary-key-table.md:
##########
@@ -158,12 +154,28 @@ Full compaction changelog producer can produce complete 
changelog for any type o
 
 {{< /hint >}}
 
-## Changelog Tables without Primary Keys
+## Sequence Field
 
-Changelog tables can also be used without primary keys. Users can only insert 
or delete a whole record from the table. No update is supported.
+By default, the primary key table determines the merge order according to the 
input order. However, in distributed computing,

Review Comment:
   ```suggestion
   By default, the primary key table determines the merge order according to 
the input order (the last input record will be the last to merge). However, in 
distributed computing,
   ```



##########
docs/content/docs/concepts/append-only-table.md:
##########
@@ -0,0 +1,107 @@
+---
+title: "Append Only Table"
+weight: 7
+type: docs
+aliases:
+- /concepts/append-only-table.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Append Only Table
+
+By specifying `'write-mode' = 'append-only'` when creating the table, user 
creates an append-only table.
+
+You can only insert a complete record into the table. No delete or update is 
supported and you cannot define primary keys.
+This type of table is suitable for use cases that do not require updates (such 
as log data synchronization).
+
+## Bucketing
+
+You also need to define bucket number for Append-only table, see [Bucket]({{< 
ref "docs/concepts/basic-concepts#bucket" >}}).

Review Comment:
   ```suggestion
   You can also define bucket number for Append-only table, see [Bucket]({{< 
ref "docs/concepts/basic-concepts#bucket" >}}).
   ```
   
   Users don't "need to" define bucket number. They can always use the default 
bucket number, which is 1.



##########
docs/content/docs/concepts/append-only-table.md:
##########
@@ -0,0 +1,107 @@
+---
+title: "Append Only Table"
+weight: 7
+type: docs
+aliases:
+- /concepts/append-only-table.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Append Only Table
+
+By specifying `'write-mode' = 'append-only'` when creating the table, user 
creates an append-only table.
+
+You can only insert a complete record into the table. No delete or update is 
supported and you cannot define primary keys.
+This type of table is suitable for use cases that do not require updates (such 
as log data synchronization).
+
+## Bucketing
+
+You also need to define bucket number for Append-only table, see [Bucket]({{< 
ref "docs/concepts/basic-concepts#bucket" >}}).
+
+It is recommended that you set the `bucket-key` field. Otherwise, the data 
will be hashed according to the whole row,
+and the performance will be poor.
+
+## Streaming Read Order
+
+For streaming reads, records are produced in the following order:
+
+* For any two records from two different partitions
+  * If `scan.plan-sort-partition` is set to true, the record with a smaller 
partition value will be produced first.
+  * Otherwise, the record with an earlier partition creation time will be 
produced first.
+* For any two records from the same partition and the same bucket, the first 
written record will be produced first.
+* For any two records from the same partition but two different buckets, 
different buckets are processed by different tasks, there is no order guarantee 
between them.
+
+## Compaction
+
+By default, the sink node will automatically perform compaction to solve the 
small file problem. The following options

Review Comment:
   ```suggestion
   By default, the sink node will automatically perform compaction to control 
the number of files. The following options
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-table-store] tsreaper commented on a diff in pull request #562: [FLINK-31248] Improve documentation for append-only table

Reply via email to