[incubator-paimon] branch master updated: [doc] Optimize primary key table, tags, append-only docs

lzljs3620320 Wed, 06 Sep 2023 02:35:56 -0700

This is an automated email from the ASF dual-hosted git repository.

lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-paimon.git



The following commit(s) were added to refs/heads/master by this push:
     new 376389159 [doc] Optimize primary key table, tags, append-only docs
376389159 is described below

commit 376389159987739d86d08e77b901c1dc990bd667
Author: Jingsong <[email protected]>
AuthorDate: Wed Sep 6 17:32:50 2023 +0800

    [doc] Optimize primary key table, tags, append-only docs
---
 docs/content/concepts/append-only-table.md    | 3 ++-
 docs/content/concepts/primary-key-table.md    | 8 ++++++++
 docs/content/maintenance/manage-tags.md       | 2 +-
 docs/content/maintenance/write-performance.md | 2 +-
 4 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/docs/content/concepts/append-only-table.md 
b/docs/content/concepts/append-only-table.md
index 49c992f66..3ae9d88b3 100644
--- a/docs/content/concepts/append-only-table.md
+++ b/docs/content/concepts/append-only-table.md
@@ -207,8 +207,9 @@ although we can stream read and write still). All the 
records will go into one d
 and we do not maintain the order anymore. As we don't have the concept of 
bucket, we will not shuffle the input records by bucket anymore,
 which will speed up the inserting.
 
-{{< img src="/img/for-scalable.png">}}
+Using this mode, you can replace your Hive table to lake table.
 
+{{< img src="/img/for-scalable.png">}}
 
 ### Compaction
 
diff --git a/docs/content/concepts/primary-key-table.md 
b/docs/content/concepts/primary-key-table.md
index 8a54cfb69..06153616f 100644
--- a/docs/content/concepts/primary-key-table.md
+++ b/docs/content/concepts/primary-key-table.md
@@ -263,6 +263,8 @@ By specifying `'merge-engine' = 'first-row'`, users can 
keep the first row of th
 2. You can not specify `sequence.field`.
 3. Not accept `DELETE` and `UPDATE_BEFORE` message.
 
+This is of great help in replacing log deduplication in streaming computation.
+
 ## Changelog Producers
 
 Streaming queries will continuously produce the latest changes. These changes 
can come from the underlying table files or from an [external log system]({{< 
ref "concepts/external-log-systems" >}}) like Kafka. Compared to the external 
log system, changes from table files have lower cost but higher latency 
(depending on how often snapshots are created).
@@ -341,6 +343,9 @@ Lookup will cache data on the memory and local disk, you 
can use the following o
 Lookup changelog-producer supports `changelog-producer.row-deduplicate` to 
avoid generating -U, +U
 changelog for the same record.
 
+(Note: Please increase `'execution.checkpointing.max-concurrent-checkpoints'` 
Flink configuration, this is very
+important for performance).
+
 ### Full Compaction
 
 If you think the resource consumption of 'lookup' is too large, you can 
consider using 'full-compaction' changelog producer,
@@ -361,6 +366,9 @@ Full compaction changelog producer can produce complete 
changelog for any type o
 Full-compaction changelog-producer supports 
`changelog-producer.row-deduplicate` to avoid generating -U, +U
 changelog for the same record.
 
+(Note: Please increase `'execution.checkpointing.max-concurrent-checkpoints'` 
Flink configuration, this is very
+important for performance).
+
 ## Sequence Field
 
 By default, the primary key table determines the merge order according to the 
input order (the last input record will be the last to merge). However, in 
distributed computing,
diff --git a/docs/content/maintenance/manage-tags.md 
b/docs/content/maintenance/manage-tags.md
index a0a87e2c5..1bb97535a 100644
--- a/docs/content/maintenance/manage-tags.md
+++ b/docs/content/maintenance/manage-tags.md
@@ -26,7 +26,7 @@ under the License.
 
 # Manage Tags
 
-Paimon's snapshots can provide a easy way to query historical data. But in 
most scenarios, a job will generate too many
+Paimon's snapshots can provide an easy way to query historical data. But in 
most scenarios, a job will generate too many
 snapshots and table will expire old snapshots according to table 
configuration. Snapshot expiration will also delete old
 data files, and the historical data of expired snapshots cannot be queried 
anymore.
 
diff --git a/docs/content/maintenance/write-performance.md 
b/docs/content/maintenance/write-performance.md
index 21d18faad..7c2b711e8 100644
--- a/docs/content/maintenance/write-performance.md
+++ b/docs/content/maintenance/write-performance.md
@@ -69,7 +69,7 @@ It is recommended that the parallelism of sink should be less 
than or equal to t
 ### Asynchronous Compaction
 
 Compaction is inherently asynchronous, but if you want it to be completely 
asynchronous and not blocking writing,
-expect a mode to have maximum write throughput, the compaction can be done 
slowly and not in a hurry.
+expect a mode to have maximum writing throughput, the compaction can be done 
slowly and not in a hurry.
 You can use the following strategies for your table:
 
 ```shell

[incubator-paimon] branch master updated: [doc] Optimize primary key table, tags, append-only docs

Reply via email to