[incubator-paimon] branch master updated: [doc] Move Watermark and Bounded Stream to Append only table page

lzljs3620320 Tue, 04 Apr 2023 23:54:40 -0700

This is an automated email from the ASF dual-hosted git repository.

lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-paimon.git



The following commit(s) were added to refs/heads/master by this push:
     new d2978ddb6 [doc] Move Watermark and Bounded Stream to Append only table 
page
d2978ddb6 is described below

commit d2978ddb6720186d27110f7ff14d06f3ed688b7c
Author: JingsongLi <[email protected]>
AuthorDate: Wed Apr 5 14:54:23 2023 +0800

    [doc] Move Watermark and Bounded Stream to Append only table page
---
 docs/content/concepts/append-only-table.md | 100 +++++++++++++++++++++++++----
 docs/content/how-to/querying-tables.md     |  72 ---------------------
 2 files changed, 89 insertions(+), 83 deletions(-)

diff --git a/docs/content/concepts/append-only-table.md 
b/docs/content/concepts/append-only-table.md
index 502593ce8..e20b9252c 100644
--- a/docs/content/concepts/append-only-table.md
+++ b/docs/content/concepts/append-only-table.md
@@ -38,16 +38,6 @@ You can also define bucket number for Append-only table, see 
[Bucket]({{< ref "c
 It is recommended that you set the `bucket-key` field. Otherwise, the data 
will be hashed according to the whole row,
 and the performance will be poor.
 
-## Streaming Read Order
-
-For streaming reads, records are produced in the following order:
-
-* For any two records from two different partitions
-  * If `scan.plan-sort-partition` is set to true, the record with a smaller 
partition value will be produced first.
-  * Otherwise, the record with an earlier partition creation time will be 
produced first.
-* For any two records from the same partition and the same bucket, the first 
written record will be produced first.
-* For any two records from the same partition but two different buckets, 
different buckets are processed by different tasks, there is no order guarantee 
between them.
-
 ## Compaction
 
 By default, the sink node will automatically perform compaction to control the 
number of files. The following options
@@ -76,14 +66,102 @@ control the strategy of compaction:
             <td>For file set [f_0,...,f_N], the minimum file number which 
satisfies sum(size(f_i)) &gt;= targetFileSize to trigger a compaction for 
append-only table. This value avoids almost-full-file to be compacted, which is 
not cost-effective.</td>
         </tr>
         <tr>
-            <td><h5>compaction.early-max.file-num</h5></td>
+            <td><h5>compaction.max.file-num</h5></td>
             <td style="word-wrap: break-word;">50</td>
             <td>Integer</td>
             <td>For file set [f_0,...,f_N], the maximum file number to trigger 
a compaction for append-only table, even if sum(size(f_i)) &lt; targetFileSize. 
This value avoids pending too much small files, which slows down the 
performance.</td>
         </tr>
+        <tr>
+            <td><h5>full-compaction.delta-commits</h5></td>
+            <td style="word-wrap: break-word;">(none)</td>
+            <td>Integer</td>
+            <td>Full compaction will be constantly triggered after delta 
commits.</td>
+        </tr>
+    </tbody>
+</table>
+
+## Streaming Source
+
+Streaming source behavior is only supported in Flink engine at present.
+
+### Streaming Read Order
+
+For streaming reads, records are produced in the following order:
+
+* For any two records from two different partitions
+  * If `scan.plan-sort-partition` is set to true, the record with a smaller 
partition value will be produced first.
+  * Otherwise, the record with an earlier partition creation time will be 
produced first.
+* For any two records from the same partition and the same bucket, the first 
written record will be produced first.
+* For any two records from the same partition but two different buckets, 
different buckets are processed by different tasks, there is no order guarantee 
between them.
+
+### Watermark Definition
+
+You can define watermark for reading Paimon tables:
+
+```sql
+CREATE TABLE T (
+    `user` BIGINT,
+    product STRING,
+    order_time TIMESTAMP(3),
+    WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND
+) WITH (...);
+
+-- launch a bounded streaming job to read paimon_table
+SELECT window_start, window_end, SUM(f0) FROM
+ TUMBLE(TABLE T, DESCRIPTOR(order_time), INTERVAL '10' MINUTES)) GROUP BY 
window_start, window_end; */;
+```
+
+You can also enable [Flink Watermark 
alignment](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_),
+which will make sure no sources/splits/shards/partitions increase their 
watermarks too far ahead of the rest:
+
+<table class="configuration table table-bordered">
+    <thead>
+        <tr>
+            <th class="text-left" style="width: 20%">Key</th>
+            <th class="text-left" style="width: 15%">Default</th>
+            <th class="text-left" style="width: 10%">Type</th>
+            <th class="text-left" style="width: 55%">Description</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td><h5>scan.watermark.alignment.group</h5></td>
+            <td style="word-wrap: break-word;">(none)</td>
+            <td>String</td>
+            <td>A group of sources to align watermarks.</td>
+        </tr>
+        <tr>
+            <td><h5>scan.watermark.alignment.max-drift</h5></td>
+            <td style="word-wrap: break-word;">(none)</td>
+            <td>Duration</td>
+            <td>Maximal drift to align watermarks, before we pause consuming 
from the source/task/partition.</td>
+        </tr>
     </tbody>
 </table>
 
+### Bounded Stream
+
+Streaming Source can also be bounded, you can specify 'scan.bounded.watermark' 
to define the end condition for bounded streaming mode, stream reading will end 
until a larger watermark snapshot is encountered.
+
+Watermark in snapshot is generated by writer, for example, you can specify a 
kafka source and declare the definition of watermark.
+When using this kafka source to write to Paimon table, the snapshots of Paimon 
table will generate the corresponding watermark,
+so that you can use the feature of bounded watermark when streaming reads of 
this Paimon table.
+
+```sql
+CREATE TABLE kafka_table (
+    `user` BIGINT,
+    product STRING,
+    order_time TIMESTAMP(3),
+    WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND
+) WITH ('connector' = 'kafka'...);
+
+-- launch a streaming insert job
+INSERT INTO paimon_table SELECT * FROM kakfa_table;
+
+-- launch a bounded streaming job to read paimon_table
+SELECT * FROM paimon_table /*+ OPTIONS('scan.bounded.watermark'='...') */;
+```
+
 ## Example
 
 The following is an example of creating the Append-Only table and specifying 
the bucket key.
diff --git a/docs/content/how-to/querying-tables.md 
b/docs/content/how-to/querying-tables.md
index 55863e705..cf8aad5e6 100644
--- a/docs/content/how-to/querying-tables.md
+++ b/docs/content/how-to/querying-tables.md
@@ -92,78 +92,6 @@ Users can also adjust `changelog-producer` table property to 
specify the pattern
 
 {{< img src="/img/scan-mode.png">}}
 
-## Streaming Source
-
-Streaming source behavior is only supported in Flink engine at present.
-
-### Watermark Definition
-
-You can define watermark for reading Paimon tables:
-
-```sql
-CREATE TABLE T (
-    `user` BIGINT,
-    product STRING,
-    order_time TIMESTAMP(3),
-    WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND
-);
-
--- launch a bounded streaming job to read paimon_table
-SELECT window_start, window_end, SUM(f0) FROM
- TUMBLE(TABLE T, DESCRIPTOR(order_time), INTERVAL '10' MINUTES)) GROUP BY 
window_start, window_end; */;
-```
-
-You can also enable [Flink Watermark 
alignment](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_),
-which will make sure no sources/splits/shards/partitions increase their 
watermarks too far ahead of the rest:
-
-<table class="configuration table table-bordered">
-    <thead>
-        <tr>
-            <th class="text-left" style="width: 20%">Key</th>
-            <th class="text-left" style="width: 15%">Default</th>
-            <th class="text-left" style="width: 10%">Type</th>
-            <th class="text-left" style="width: 55%">Description</th>
-        </tr>
-    </thead>
-    <tbody>
-        <tr>
-            <td><h5>scan.watermark.alignment.group</h5></td>
-            <td style="word-wrap: break-word;">(none)</td>
-            <td>String</td>
-            <td>A group of sources to align watermarks.</td>
-        </tr>
-        <tr>
-            <td><h5>scan.watermark.alignment.max-drift</h5></td>
-            <td style="word-wrap: break-word;">(none)</td>
-            <td>Duration</td>
-            <td>Maximal drift to align watermarks, before we pause consuming 
from the source/task/partition.</td>
-        </tr>
-    </tbody>
-</table>
-
-### Bounded Stream
-
-Streaming Source can also be bounded, you can specify 'scan.bounded.watermark' 
to define the end condition for bounded streaming mode, stream reading will end 
until a larger watermark snapshot is encountered.
-
-Watermark in snapshot is generated by writer, for example, you can specify a 
kafka source and declare the definition of watermark.
-When using this kafka source to write to Paimon table, the snapshots of Paimon 
table will generate the corresponding watermark,
-so that you can use the feature of bounded watermark when streaming reads of 
this Paimon table.
-
-```sql
-CREATE TABLE kafka_table (
-    `user` BIGINT,
-    product STRING,
-    order_time TIMESTAMP(3),
-    WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND
-) WITH ('connector' = 'kafka'...);
-
--- launch a streaming insert job
-INSERT INTO paimon_table SELECT * FROM kakfa_table;
-
--- launch a bounded streaming job to read paimon_table
-SELECT * FROM paimon_table /*+ OPTIONS('scan.bounded.watermark'='...') */;
-```
-
 ## Time Travel
 
 Currently, Paimon supports time travel for Flink and Spark 3 (requires Spark 
3.3+).

[incubator-paimon] branch master updated: [doc] Move Watermark and Bounded Stream to Append only table page

Reply via email to