This is an automated email from the ASF dual-hosted git repository. lzljs3620320 pushed a commit to branch release-0.5 in repository https://gitbox.apache.org/repos/asf/incubator-paimon.git
commit 4345fa60c45ddc9ee12fd3dd4008e5553f1b31b3 Author: Jingsong <[email protected]> AuthorDate: Mon Aug 28 19:33:00 2023 +0800 [doc] Document performance for dynamic bucket --- docs/content/concepts/primary-key-table.md | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/docs/content/concepts/primary-key-table.md b/docs/content/concepts/primary-key-table.md index d82f3d6e3..982af7a30 100644 --- a/docs/content/concepts/primary-key-table.md +++ b/docs/content/concepts/primary-key-table.md @@ -53,13 +53,18 @@ Configure `'bucket' = '-1'`, Paimon dynamically maintains the index, automatic e Dynamic Bucket only support single write job. Please do not start multiple jobs to write to the same partition. {{< /hint >}} -**Normal Dynamic Bucket Mode**: +#### Normal Dynamic Bucket Mode When your updates do not cross partitions (no partitions, or primary keys contain all partition fields), Dynamic -Bucket mode uses HASH index to maintain mapping from key to bucket, it requires more memory than fixed bucket mode, -100 million entries in a partition takes up 1 GB more memory, partitions that are no longer active do not take up memory. +Bucket mode uses HASH index to maintain mapping from key to bucket, it requires more memory than fixed bucket mode. -**Cross Partitions Update Dynamic Bucket Mode**: +Performance: + +1. Generally speaking, there is no performance loss, but there will be some additional memory consumption, **100 million** + entries in a partition takes up **1 GB** more memory, partitions that are no longer active do not take up memory. +2. For tables with low update rates, this mode is recommended to significantly improve performance. + +#### Cross Partitions Update Dynamic Bucket Mode {{< hint info >}} This is an experimental feature. @@ -68,10 +73,14 @@ This is an experimental feature. When you need cross partition updates (primary keys not contain all partition fields), Dynamic Bucket mode directly maintains the mapping of keys to partition and bucket, uses local disks, and initializes indexes by reading all existing keys in the table when starting stream write job. Different merge engines have different behaviors: + 1. Deduplicate: Delete data from the old partition and insert new data into the new partition. 2. PartialUpdate & Aggregation: Insert new data into the old partition. 3. FirstRow: Ignore new data if there is old value. +Performance: For tables with a large amount of data, there will be a significant loss in performance. Moreover, +initialization takes a long time. + ## Merge Engines When Paimon sink receives two or more records with the same primary keys, it will merge them into one record to keep primary keys unique. By specifying the `merge-engine` table property, users can choose how records are merged together.
