This is an automated email from the ASF dual-hosted git repository.
lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-paimon.git
The following commit(s) were added to refs/heads/master by this push:
new 376389159 [doc] Optimize primary key table, tags, append-only docs
376389159 is described below
commit 376389159987739d86d08e77b901c1dc990bd667
Author: Jingsong <[email protected]>
AuthorDate: Wed Sep 6 17:32:50 2023 +0800
[doc] Optimize primary key table, tags, append-only docs
---
docs/content/concepts/append-only-table.md | 3 ++-
docs/content/concepts/primary-key-table.md | 8 ++++++++
docs/content/maintenance/manage-tags.md | 2 +-
docs/content/maintenance/write-performance.md | 2 +-
4 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/docs/content/concepts/append-only-table.md
b/docs/content/concepts/append-only-table.md
index 49c992f66..3ae9d88b3 100644
--- a/docs/content/concepts/append-only-table.md
+++ b/docs/content/concepts/append-only-table.md
@@ -207,8 +207,9 @@ although we can stream read and write still). All the
records will go into one d
and we do not maintain the order anymore. As we don't have the concept of
bucket, we will not shuffle the input records by bucket anymore,
which will speed up the inserting.
-{{< img src="/img/for-scalable.png">}}
+Using this mode, you can replace your Hive table to lake table.
+{{< img src="/img/for-scalable.png">}}
### Compaction
diff --git a/docs/content/concepts/primary-key-table.md
b/docs/content/concepts/primary-key-table.md
index 8a54cfb69..06153616f 100644
--- a/docs/content/concepts/primary-key-table.md
+++ b/docs/content/concepts/primary-key-table.md
@@ -263,6 +263,8 @@ By specifying `'merge-engine' = 'first-row'`, users can
keep the first row of th
2. You can not specify `sequence.field`.
3. Not accept `DELETE` and `UPDATE_BEFORE` message.
+This is of great help in replacing log deduplication in streaming computation.
+
## Changelog Producers
Streaming queries will continuously produce the latest changes. These changes
can come from the underlying table files or from an [external log system]({{<
ref "concepts/external-log-systems" >}}) like Kafka. Compared to the external
log system, changes from table files have lower cost but higher latency
(depending on how often snapshots are created).
@@ -341,6 +343,9 @@ Lookup will cache data on the memory and local disk, you
can use the following o
Lookup changelog-producer supports `changelog-producer.row-deduplicate` to
avoid generating -U, +U
changelog for the same record.
+(Note: Please increase `'execution.checkpointing.max-concurrent-checkpoints'`
Flink configuration, this is very
+important for performance).
+
### Full Compaction
If you think the resource consumption of 'lookup' is too large, you can
consider using 'full-compaction' changelog producer,
@@ -361,6 +366,9 @@ Full compaction changelog producer can produce complete
changelog for any type o
Full-compaction changelog-producer supports
`changelog-producer.row-deduplicate` to avoid generating -U, +U
changelog for the same record.
+(Note: Please increase `'execution.checkpointing.max-concurrent-checkpoints'`
Flink configuration, this is very
+important for performance).
+
## Sequence Field
By default, the primary key table determines the merge order according to the
input order (the last input record will be the last to merge). However, in
distributed computing,
diff --git a/docs/content/maintenance/manage-tags.md
b/docs/content/maintenance/manage-tags.md
index a0a87e2c5..1bb97535a 100644
--- a/docs/content/maintenance/manage-tags.md
+++ b/docs/content/maintenance/manage-tags.md
@@ -26,7 +26,7 @@ under the License.
# Manage Tags
-Paimon's snapshots can provide a easy way to query historical data. But in
most scenarios, a job will generate too many
+Paimon's snapshots can provide an easy way to query historical data. But in
most scenarios, a job will generate too many
snapshots and table will expire old snapshots according to table
configuration. Snapshot expiration will also delete old
data files, and the historical data of expired snapshots cannot be queried
anymore.
diff --git a/docs/content/maintenance/write-performance.md
b/docs/content/maintenance/write-performance.md
index 21d18faad..7c2b711e8 100644
--- a/docs/content/maintenance/write-performance.md
+++ b/docs/content/maintenance/write-performance.md
@@ -69,7 +69,7 @@ It is recommended that the parallelism of sink should be less
than or equal to t
### Asynchronous Compaction
Compaction is inherently asynchronous, but if you want it to be completely
asynchronous and not blocking writing,
-expect a mode to have maximum write throughput, the compaction can be done
slowly and not in a hurry.
+expect a mode to have maximum writing throughput, the compaction can be done
slowly and not in a hurry.
You can use the following strategies for your table:
```shell