This is an automated email from the ASF dual-hosted git repository.
lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git
The following commit(s) were added to refs/heads/master by this push:
new d9a1b80a41 [doc] Document changelog producer to use lookup
d9a1b80a41 is described below
commit d9a1b80a41169c41eb2628790d8bc4e7fc68467c
Author: Jingsong <[email protected]>
AuthorDate: Mon Nov 25 15:23:44 2024 +0800
[doc] Document changelog producer to use lookup
---
.../primary-key-table/changelog-producer.md | 25 +++++++++++++---------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/docs/content/primary-key-table/changelog-producer.md
b/docs/content/primary-key-table/changelog-producer.md
index bf7a23fae2..011f7b6f27 100644
--- a/docs/content/primary-key-table/changelog-producer.md
+++ b/docs/content/primary-key-table/changelog-producer.md
@@ -58,9 +58,11 @@ By specifying `'changelog-producer' = 'input'`, Paimon
writers rely on their inp
## Lookup
-If your input can’t produce a complete changelog but you still want to get rid
of the costly normalized operator, you may consider using the `'lookup'`
changelog producer.
+If your input can’t produce a complete changelog but you still want to get rid
of the costly normalized operator, you
+may consider using the `'lookup'` changelog producer.
-By specifying `'changelog-producer' = 'lookup'`, Paimon will generate
changelog through `'lookup'` before committing the data writing.
+By specifying `'changelog-producer' = 'lookup'`, Paimon will generate
changelog through `'lookup'` before committing
+the data writing (You can also enable [Async Compaction]({{< ref
"primary-key-table/compaction#asynchronous-compaction" >}})).
{{< img src="/img/changelog-producer-lookup.png">}}
@@ -105,23 +107,26 @@ important for performance).
## Full Compaction
-If you think the resource consumption of 'lookup' is too large, you can
consider using 'full-compaction' changelog producer,
-which can decouple data writing and changelog generation, and is more suitable
for scenarios with high latency (For example, 10 minutes).
+You can also consider using 'full-compaction' changelog producer to generate
changelog, and is more suitable for scenarios
+with large latency (For example, 30 minutes).
-By specifying `'changelog-producer' = 'full-compaction'`, Paimon will compare
the results between full compactions and produce the differences as changelog.
The latency of changelog is affected by the frequency of full compactions.
+1. By specifying `'changelog-producer' = 'full-compaction'`, Paimon will
compare the results between full compactions and
+produce the differences as changelog. The latency of changelog is affected by
the frequency of full compactions.
+2. By specifying `full-compaction.delta-commits` table property, full
compaction will be constantly triggered after delta
+commits (checkpoints). This is set to 1 by default, so each checkpoint will
have a full compression and generate a
+changelog.
-By specifying `full-compaction.delta-commits` table property, full compaction
will be constantly triggered after delta commits (checkpoints). This is set to
1 by default, so each checkpoint will have a full compression and generate a
change log.
+Generally speaking, the cost and consumption of full compaction are high, so
we recommend using `'lookup'` changelog
+producer.
{{< img src="/img/changelog-producer-full-compaction.png">}}
{{< hint info >}}
-Full compaction changelog producer can produce complete changelog for any type
of source. However it is not as efficient as the input changelog producer and
the latency to produce changelog might be high.
+Full compaction changelog producer can produce complete changelog for any type
of source. However it is not as
+efficient as the input changelog producer and the latency to produce changelog
might be high.
{{< /hint >}}
Full-compaction changelog-producer supports
`changelog-producer.row-deduplicate` to avoid generating -U, +U
changelog for the same record.
-
-(Note: Please increase `'execution.checkpointing.max-concurrent-checkpoints'`
Flink configuration, this is very
-important for performance).