This is an automated email from the ASF dual-hosted git repository.
lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git
The following commit(s) were added to refs/heads/master by this push:
new ac69df2b4 [doc] Move read performance to primary key table
ac69df2b4 is described below
commit ac69df2b479d93af6dfef6c2bf873d67c96baebe
Author: Jingsong <[email protected]>
AuthorDate: Wed Apr 3 14:43:25 2024 +0800
[doc] Move read performance to primary key table
---
.../primary-key-table/read-optimized.md} | 19 ++++++++++++-------
docs/content/learn-paimon/understand-files.md | 3 +--
2 files changed, 13 insertions(+), 9 deletions(-)
diff --git a/docs/content/maintenance/read-performance.md
b/docs/content/concepts/primary-key-table/read-optimized.md
similarity index 73%
rename from docs/content/maintenance/read-performance.md
rename to docs/content/concepts/primary-key-table/read-optimized.md
index 6b9ba3462..7be681124 100644
--- a/docs/content/maintenance/read-performance.md
+++ b/docs/content/concepts/primary-key-table/read-optimized.md
@@ -1,9 +1,9 @@
---
-title: "Read Performance"
-weight: 2
+title: "Read Optimized"
+weight: 7
type: docs
aliases:
-- /maintenance/read-performance.html
+- /concepts/primary-key-table/read-optimized.html
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
@@ -24,18 +24,23 @@ specific language governing permissions and limitations
under the License.
-->
-# Read Performance
+# Read Optimized
-## Primary Key Table
+## Overview
For Primary Key Table, it's a 'MergeOnRead' technology. When reading data,
multiple layers of LSM data are merged,
and the number of parallelism will be limited by the number of buckets.
Although Paimon's merge performance is efficient,
it still cannot catch up with the ordinary AppendOnly table.
-If you want to query fast enough in certain scenarios, but can only find older
data, you can:
+We recommend that you use [Deletion Vectors]({{< ref
"concepts/primary-key-table/deletion-vectors" >}}) mode.
+
+If you don't want to use Deletion Vectors mode, you want to query fast enough
in certain scenarios, but can only find
+older data, you can also:
1. Configure 'compaction.optimization-interval' when writing data. For
streaming jobs, optimized compaction will then
- be performed periodically; For batch jobs, optimized compaction will be
carried out when the job ends.
+ be performed periodically; For batch jobs, optimized compaction will be
carried out when the job ends. (Or configure
+ `'full-compaction.delta-commits'`, its disadvantage is that it can only
perform compaction synchronously, which will
+ affect writing efficiency)
2. Query from [read-optimized system table]({{< ref
"how-to/system-tables#read-optimized-table" >}}). Reading from
results of optimized files avoids merging records with the same key, thus
improving reading performance.
diff --git a/docs/content/learn-paimon/understand-files.md
b/docs/content/learn-paimon/understand-files.md
index 0649b5092..aa8aa6ec7 100644
--- a/docs/content/learn-paimon/understand-files.md
+++ b/docs/content/learn-paimon/understand-files.md
@@ -463,6 +463,5 @@ Maybe you think the 5 files for the primary key table are
actually okay, but the
may have 50 small files in a single bucket, which is very difficult to accept.
Worse still, partitions that
are no longer active also keep so many small files.
-It is recommended that you configure [Full-Compaction]({{< ref
"/maintenance/read-performance#full-compaction" >}}),
-configure ‘full-compaction.delta-commits’ perform full-compaction periodically
in Flink writing. And it can ensure
+Configure ‘full-compaction.delta-commits’ perform full-compaction periodically
in Flink writing. And it can ensure
that partitions are full compacted before writing ends.