This is an automated email from the ASF dual-hosted git repository.
lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-paimon.git
The following commit(s) were added to refs/heads/master by this push:
new 035b6347b [doc] Document Prioritize write throughput mode
035b6347b is described below
commit 035b6347b218cfc4c624c1590bd49300c3edb87a
Author: JingsongLi <[email protected]>
AuthorDate: Fri Jun 30 19:05:45 2023 +0800
[doc] Document Prioritize write throughput mode
---
docs/content/maintenance/write-performance.md | 59 ++++++++++++++++++---------
1 file changed, 39 insertions(+), 20 deletions(-)
diff --git a/docs/content/maintenance/write-performance.md
b/docs/content/maintenance/write-performance.md
index 112ca9a47..140173c6f 100644
--- a/docs/content/maintenance/write-performance.md
+++ b/docs/content/maintenance/write-performance.md
@@ -58,19 +58,14 @@ It is recommended that the parallelism of sink should be
less than or equal to t
</tbody>
</table>
-## Write Initialize
-
-In the initialization of write, the writer of the bucket needs to read all
historical files. If there is a bottleneck
-here (For example, writing a large number of partitions simultaneously), you
can use `write-manifest-cache` to cache
-the read manifest data to accelerate initialization.
-
## Compaction
-### Number of Sorted Runs to Trigger Compaction
-
-Paimon uses [LSM tree]({{< ref "concepts/file-layouts#lsm-trees" >}}) which
supports a large number of updates. LSM organizes files in several [sorted
runs]({{< ref "concepts/file-layouts#sorted-runs" >}}). When querying records
from an LSM tree, all sorted runs must be combined to produce a complete view
of all records.
+### Number of Sorted Runs to Pause Writing
-One can easily see that too many sorted runs will result in poor query
performance. To keep the number of sorted runs in a reasonable range, Paimon
writers will automatically perform [compactions]({{< ref
"concepts/file-layouts#compaction" >}}). The following table property
determines the minimum number of sorted runs to trigger a compaction.
+When number of sorted runs is small, Paimon writers will perform compaction
asynchronously in separated threads, so
+records can be continuously written into the table. However to avoid unbounded
growth of sorted runs, writers will
+have to pause writing when the number of sorted runs hits the threshold. The
following table property determines
+the threshold.
<table class="table table-bordered">
<thead>
@@ -84,20 +79,38 @@ One can easily see that too many sorted runs will result in
poor query performan
</thead>
<tbody>
<tr>
- <td><h5>num-sorted-run.compaction-trigger</h5></td>
+ <td><h5>num-sorted-run.stop-trigger</h5></td>
<td>No</td>
- <td style="word-wrap: break-word;">5</td>
+ <td style="word-wrap: break-word;">(none)</td>
<td>Integer</td>
- <td>The sorted run number to trigger compaction. Includes level0 files
(one file one sorted run) and high-level runs (one level one sorted run).</td>
+ <td>The number of sorted runs that trigger the stopping of writes, the
default value is 'num-sorted-run.compaction-trigger' + 1.</td>
</tr>
</tbody>
</table>
-Compaction will become less frequent when `num-sorted-run.compaction-trigger`
becomes larger, thus improving writing performance. However, if this value
becomes too large, more memory and CPU time will be needed when querying the
table. This is a trade-off between writing and query performance.
+Write stalls will become less frequent when `num-sorted-run.stop-trigger`
becomes larger, thus improving writing
+performance. However, if this value becomes too large, more memory and CPU
time will be needed when querying the
+table. If you are concerned about the OOM of memory, please configure the
following option `sort-spill-threshold`.
+Its value depends on your memory size.
-### Number of Sorted Runs to Pause Writing
+### Prioritize write throughput
+
+If you expect a mode to have maximum write throughput, the compaction can be
done slowly and not in a hurry.
+You can use the following strategies for your table:
+
+```shell
+num-sorted-run.stop-trigger = 2147483647
+sort-spill-threshold = 10
+```
+
+This configuration will generate more files during peak write periods and
gradually merge into optimal read
+performance during low write periods.
-When number of sorted runs is small, Paimon writers will perform compaction
asynchronously in separated threads, so records can be continuously written
into the table. However to avoid unbounded growth of sorted runs, writers will
have to pause writing when the number of sorted runs hits the threshold. The
following table property determines the threshold.
+### Number of Sorted Runs to Trigger Compaction
+
+Paimon uses [LSM tree]({{< ref "concepts/file-layouts#lsm-trees" >}}) which
supports a large number of updates. LSM organizes files in several [sorted
runs]({{< ref "concepts/file-layouts#sorted-runs" >}}). When querying records
from an LSM tree, all sorted runs must be combined to produce a complete view
of all records.
+
+One can easily see that too many sorted runs will result in poor query
performance. To keep the number of sorted runs in a reasonable range, Paimon
writers will automatically perform [compactions]({{< ref
"concepts/file-layouts#compaction" >}}). The following table property
determines the minimum number of sorted runs to trigger a compaction.
<table class="table table-bordered">
<thead>
@@ -111,16 +124,22 @@ When number of sorted runs is small, Paimon writers will
perform compaction asyn
</thead>
<tbody>
<tr>
- <td><h5>num-sorted-run.stop-trigger</h5></td>
+ <td><h5>num-sorted-run.compaction-trigger</h5></td>
<td>No</td>
- <td style="word-wrap: break-word;">(none)</td>
+ <td style="word-wrap: break-word;">5</td>
<td>Integer</td>
- <td>The number of sorted runs that trigger the stopping of writes, the
default value is 'num-sorted-run.compaction-trigger' + 1.</td>
+ <td>The sorted run number to trigger compaction. Includes level0 files
(one file one sorted run) and high-level runs (one level one sorted run).</td>
</tr>
</tbody>
</table>
-Write stalls will become less frequent when `num-sorted-run.stop-trigger`
becomes larger, thus improving writing performance. However, if this value
becomes too large, more memory and CPU time will be needed when querying the
table. This is a trade-off between writing and query performance.
+Compaction will become less frequent when `num-sorted-run.compaction-trigger`
becomes larger, thus improving writing performance. However, if this value
becomes too large, more memory and CPU time will be needed when querying the
table. This is a trade-off between writing and query performance.
+
+## Write Initialize
+
+In the initialization of write, the writer of the bucket needs to read all
historical files. If there is a bottleneck
+here (For example, writing a large number of partitions simultaneously), you
can use `write-manifest-cache` to cache
+the read manifest data to accelerate initialization.
## Memory