[incubator-paimon] branch master updated: [doc] Document Prioritize write throughput mode

lzljs3620320 Fri, 30 Jun 2023 04:05:58 -0700

This is an automated email from the ASF dual-hosted git repository.

lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-paimon.git



The following commit(s) were added to refs/heads/master by this push:
     new 035b6347b [doc] Document Prioritize write throughput mode
035b6347b is described below

commit 035b6347b218cfc4c624c1590bd49300c3edb87a
Author: JingsongLi <[email protected]>
AuthorDate: Fri Jun 30 19:05:45 2023 +0800

    [doc] Document Prioritize write throughput mode
---
 docs/content/maintenance/write-performance.md | 59 ++++++++++++++++++---------
 1 file changed, 39 insertions(+), 20 deletions(-)

diff --git a/docs/content/maintenance/write-performance.md 
b/docs/content/maintenance/write-performance.md
index 112ca9a47..140173c6f 100644
--- a/docs/content/maintenance/write-performance.md
+++ b/docs/content/maintenance/write-performance.md
@@ -58,19 +58,14 @@ It is recommended that the parallelism of sink should be 
less than or equal to t
     </tbody>
 </table>
 
-## Write Initialize
-
-In the initialization of write, the writer of the bucket needs to read all 
historical files. If there is a bottleneck
-here (For example, writing a large number of partitions simultaneously), you 
can use `write-manifest-cache` to cache
-the read manifest data to accelerate initialization.
-
 ## Compaction
 
-### Number of Sorted Runs to Trigger Compaction
-
-Paimon uses [LSM tree]({{< ref "concepts/file-layouts#lsm-trees" >}}) which 
supports a large number of updates. LSM organizes files in several [sorted 
runs]({{< ref "concepts/file-layouts#sorted-runs" >}}). When querying records 
from an LSM tree, all sorted runs must be combined to produce a complete view 
of all records.
+### Number of Sorted Runs to Pause Writing
 
-One can easily see that too many sorted runs will result in poor query 
performance. To keep the number of sorted runs in a reasonable range, Paimon 
writers will automatically perform [compactions]({{< ref 
"concepts/file-layouts#compaction" >}}). The following table property 
determines the minimum number of sorted runs to trigger a compaction.
+When number of sorted runs is small, Paimon writers will perform compaction 
asynchronously in separated threads, so
+records can be continuously written into the table. However to avoid unbounded 
growth of sorted runs, writers will
+have to pause writing when the number of sorted runs hits the threshold. The 
following table property determines
+the threshold.
 
 <table class="table table-bordered">
     <thead>
@@ -84,20 +79,38 @@ One can easily see that too many sorted runs will result in 
poor query performan
     </thead>
     <tbody>
     <tr>
-      <td><h5>num-sorted-run.compaction-trigger</h5></td>
+      <td><h5>num-sorted-run.stop-trigger</h5></td>
       <td>No</td>
-      <td style="word-wrap: break-word;">5</td>
+      <td style="word-wrap: break-word;">(none)</td>
       <td>Integer</td>
-      <td>The sorted run number to trigger compaction. Includes level0 files 
(one file one sorted run) and high-level runs (one level one sorted run).</td>
+      <td>The number of sorted runs that trigger the stopping of writes, the 
default value is 'num-sorted-run.compaction-trigger' + 1.</td>
     </tr>
     </tbody>
 </table>
 
-Compaction will become less frequent when `num-sorted-run.compaction-trigger` 
becomes larger, thus improving writing performance. However, if this value 
becomes too large, more memory and CPU time will be needed when querying the 
table. This is a trade-off between writing and query performance.
+Write stalls will become less frequent when `num-sorted-run.stop-trigger` 
becomes larger, thus improving writing
+performance. However, if this value becomes too large, more memory and CPU 
time will be needed when querying the
+table. If you are concerned about the OOM of memory, please configure the 
following option `sort-spill-threshold`.
+Its value depends on your memory size.
 
-### Number of Sorted Runs to Pause Writing
+### Prioritize write throughput
+
+If you expect a mode to have maximum write throughput, the compaction can be 
done slowly and not in a hurry.
+You can use the following strategies for your table:
+
+```shell
+num-sorted-run.stop-trigger = 2147483647
+sort-spill-threshold = 10
+```
+
+This configuration will generate more files during peak write periods and 
gradually merge into optimal read
+performance during low write periods.
 
-When number of sorted runs is small, Paimon writers will perform compaction 
asynchronously in separated threads, so records can be continuously written 
into the table. However to avoid unbounded growth of sorted runs, writers will 
have to pause writing when the number of sorted runs hits the threshold. The 
following table property determines the threshold.
+### Number of Sorted Runs to Trigger Compaction
+
+Paimon uses [LSM tree]({{< ref "concepts/file-layouts#lsm-trees" >}}) which 
supports a large number of updates. LSM organizes files in several [sorted 
runs]({{< ref "concepts/file-layouts#sorted-runs" >}}). When querying records 
from an LSM tree, all sorted runs must be combined to produce a complete view 
of all records.
+
+One can easily see that too many sorted runs will result in poor query 
performance. To keep the number of sorted runs in a reasonable range, Paimon 
writers will automatically perform [compactions]({{< ref 
"concepts/file-layouts#compaction" >}}). The following table property 
determines the minimum number of sorted runs to trigger a compaction.
 
 <table class="table table-bordered">
     <thead>
@@ -111,16 +124,22 @@ When number of sorted runs is small, Paimon writers will 
perform compaction asyn
     </thead>
     <tbody>
     <tr>
-      <td><h5>num-sorted-run.stop-trigger</h5></td>
+      <td><h5>num-sorted-run.compaction-trigger</h5></td>
       <td>No</td>
-      <td style="word-wrap: break-word;">(none)</td>
+      <td style="word-wrap: break-word;">5</td>
       <td>Integer</td>
-      <td>The number of sorted runs that trigger the stopping of writes, the 
default value is 'num-sorted-run.compaction-trigger' + 1.</td>
+      <td>The sorted run number to trigger compaction. Includes level0 files 
(one file one sorted run) and high-level runs (one level one sorted run).</td>
     </tr>
     </tbody>
 </table>
 
-Write stalls will become less frequent when `num-sorted-run.stop-trigger` 
becomes larger, thus improving writing performance. However, if this value 
becomes too large, more memory and CPU time will be needed when querying the 
table. This is a trade-off between writing and query performance.
+Compaction will become less frequent when `num-sorted-run.compaction-trigger` 
becomes larger, thus improving writing performance. However, if this value 
becomes too large, more memory and CPU time will be needed when querying the 
table. This is a trade-off between writing and query performance.
+
+## Write Initialize
+
+In the initialization of write, the writer of the bucket needs to read all 
historical files. If there is a bottleneck
+here (For example, writing a large number of partitions simultaneously), you 
can use `write-manifest-cache` to cache
+the read manifest data to accelerate initialization.
 
 ## Memory

[incubator-paimon] branch master updated: [doc] Document Prioritize write throughput mode

Reply via email to