This is an automated email from the ASF dual-hosted git repository.
lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git
The following commit(s) were added to refs/heads/master by this push:
new 4c2ba07c79 [doc] Add doc for precommit-compact
4c2ba07c79 is described below
commit 4c2ba07c7938a92c1d23a18a6dc41d15aaae48bd
Author: JingsongLi <[email protected]>
AuthorDate: Tue Jan 14 16:37:27 2025 +0800
[doc] Add doc for precommit-compact
---
docs/content/append-table/streaming.md | 15 ++++++++++++++-
docs/content/primary-key-table/changelog-producer.md | 2 +-
2 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/docs/content/append-table/streaming.md
b/docs/content/append-table/streaming.md
index acafb5b021..80217ff6a9 100644
--- a/docs/content/append-table/streaming.md
+++ b/docs/content/append-table/streaming.md
@@ -30,7 +30,20 @@ You can streaming write to the Append table in a very
flexible way through Flink
Flink, using it like a queue. The only difference is that its latency is in
minutes. Its advantages are very low cost
and the ability to push down filters and projection.
-## Automatic small file merging
+## Pre small files merging
+
+Pre means that this compact occurs before committing files to the snapshot.
+
+If Flink's checkpoint interval is short (for example, 30 seconds), each
snapshot may produce lots of small changelog
+files. Too many files may put a burden on the distributed storage cluster.
+
+In order to compact small changelog files into large ones, you can set the
table option `precommit-compact = true`.
+Default value of this option is false, if true, it will add a compact
coordinator and worker operator after the writer
+operator, which copies changelog files into large ones.
+
+## Post small files merging
+
+Post means that this compact occurs after committing files to the snapshot.
In streaming writing job, without bucket definition, there is no compaction in
writer, instead, will use
`Compact Coordinator` to scan the small files and pass compaction task to
`Compact Worker`. In streaming mode, if you
diff --git a/docs/content/primary-key-table/changelog-producer.md
b/docs/content/primary-key-table/changelog-producer.md
index a9364ee9f0..916e89a010 100644
--- a/docs/content/primary-key-table/changelog-producer.md
+++ b/docs/content/primary-key-table/changelog-producer.md
@@ -138,6 +138,6 @@ For `input`, `lookup`, `full-compaction`
'changelog-producer'.
If Flink's checkpoint interval is short (for example, 30 seconds) and the
number of buckets is large, each snapshot may
produce lots of small changelog files. Too many files may put a burden on the
distributed storage cluster.
-In order to compact small changelog files into large ones, you can set the
table option `changelog.precommit-compact = true`.
+In order to compact small changelog files into large ones, you can set the
table option `precommit-compact = true`.
Default value of this option is false, if true, it will add a compact
coordinator and worker operator after the writer
operator, which copies changelog files into large ones.