[incubator-paimon] 06/11: [doc] Document performance for dynamic bucket

lzljs3620320 Tue, 05 Sep 2023 20:20:29 -0700

This is an automated email from the ASF dual-hosted git repository.

lzljs3620320 pushed a commit to branch release-0.5
in repository https://gitbox.apache.org/repos/asf/incubator-paimon.git


commit 4345fa60c45ddc9ee12fd3dd4008e5553f1b31b3
Author: Jingsong <[email protected]>
AuthorDate: Mon Aug 28 19:33:00 2023 +0800

    [doc] Document performance for dynamic bucket
---
 docs/content/concepts/primary-key-table.md | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/docs/content/concepts/primary-key-table.md 
b/docs/content/concepts/primary-key-table.md
index d82f3d6e3..982af7a30 100644
--- a/docs/content/concepts/primary-key-table.md
+++ b/docs/content/concepts/primary-key-table.md
@@ -53,13 +53,18 @@ Configure `'bucket' = '-1'`, Paimon dynamically maintains 
the index, automatic e
 Dynamic Bucket only support single write job. Please do not start multiple 
jobs to write to the same partition.
 {{< /hint >}}
 
-**Normal Dynamic Bucket Mode**:
+#### Normal Dynamic Bucket Mode
 
 When your updates do not cross partitions (no partitions, or primary keys 
contain all partition fields), Dynamic
-Bucket mode uses HASH index to maintain mapping from key to bucket, it 
requires more memory than fixed bucket mode,
-100 million entries in a partition takes up 1 GB more memory, partitions that 
are no longer active do not take up memory.
+Bucket mode uses HASH index to maintain mapping from key to bucket, it 
requires more memory than fixed bucket mode.
 
-**Cross Partitions Update Dynamic Bucket Mode**:
+Performance:
+
+1. Generally speaking, there is no performance loss, but there will be some 
additional memory consumption, **100 million**
+   entries in a partition takes up **1 GB** more memory, partitions that are 
no longer active do not take up memory.
+2. For tables with low update rates, this mode is recommended to significantly 
improve performance.
+
+#### Cross Partitions Update Dynamic Bucket Mode
 
 {{< hint info >}}
 This is an experimental feature.
@@ -68,10 +73,14 @@ This is an experimental feature.
 When you need cross partition updates (primary keys not contain all partition 
fields), Dynamic Bucket mode directly
 maintains the mapping of keys to partition and bucket, uses local disks, and 
initializes indexes by reading all 
 existing keys in the table when starting stream write job. Different merge 
engines have different behaviors:
+
 1. Deduplicate: Delete data from the old partition and insert new data into 
the new partition.
 2. PartialUpdate & Aggregation: Insert new data into the old partition.
 3. FirstRow: Ignore new data if there is old value.
 
+Performance: For tables with a large amount of data, there will be a 
significant loss in performance. Moreover,
+initialization takes a long time.
+
 ## Merge Engines
 
 When Paimon sink receives two or more records with the same primary keys, it 
will merge them into one record to keep primary keys unique. By specifying the 
`merge-engine` table property, users can choose how records are merged together.

[incubator-paimon] 06/11: [doc] Document performance for dynamic bucket

Reply via email to