(incubator-paimon) branch master updated: [doc] Update documentation to better structure

lzljs3620320 Mon, 04 Mar 2024 00:44:20 -0800

This is an automated email from the ASF dual-hosted git repository.

lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-paimon.git



The following commit(s) were added to refs/heads/master by this push:
     new c57998f7f [doc] Update documentation to better structure
c57998f7f is described below

commit c57998f7fa28f259a2253b0bd8d94dcd2300fe48
Author: Jingsong <[email protected]>
AuthorDate: Mon Mar 4 16:43:26 2024 +0800

    [doc] Update documentation to better structure
---
 .asf.yaml                                          |  2 +-
 README.md                                          | 10 +--
 docs/content/_index.md                             | 19 +++--
 docs/content/concepts/basic-concepts.md            | 46 +++++++++----
 docs/content/concepts/file-layouts.md              | 80 ----------------------
 docs/content/concepts/overview.md                  | 15 ++--
 .../primary-key-table/changelog-producer.md        |  2 +-
 .../primary-key-table/data-distribution.md         |  2 +-
 .../content/concepts/primary-key-table/overview.md | 36 ++++++++++
 docs/content/learn-paimon/understand-files.md      |  8 +--
 docs/content/maintenance/manage-snapshots.md       |  2 +-
 docs/content/maintenance/write-performance.md      |  4 +-
 12 files changed, 102 insertions(+), 124 deletions(-)

diff --git a/.asf.yaml b/.asf.yaml
index a860484d5..30bcffa6c 100644
--- a/.asf.yaml
+++ b/.asf.yaml
@@ -18,7 +18,7 @@
 # See: 
https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features
 
 github:
-  description: "Apache Paimon(incubating) is a streaming data lake platform 
that supports high-speed data ingestion, change data tracking and efficient 
real-time analytics."
+  description: "Apache Paimon(incubating) is a lake format that enables 
building a Realtime Lakehouse Architecture with Flink and Spark for both 
streaming and batch operations."
   homepage: https://paimon.apache.org/
   labels:
     - paimon
diff --git a/README.md b/README.md
index 524c7b8b2..50c487fbc 100644
--- a/README.md
+++ b/README.md
@@ -3,12 +3,14 @@
 
[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)
 [![Get on 
Slack](https://img.shields.io/badge/slack-join-orange.svg)](https://the-asf.slack.com/archives/C053Q2NCW8G)
 
-Paimon is a streaming data lake platform that supports high-speed data 
ingestion, change data tracking and efficient real-time analytics.
+Apache Paimon(incubating) is a lake format that enables building a Realtime 
Lakehouse Architecture with Flink and Spark 
+for both streaming and batch operations. Paimon innovatively combines lake 
format and LSM structure, bringing realtime 
+streaming updates into the lake architecture.
 
 Background and documentation are available at https://paimon.apache.org
 
-`Paimon`'s former name was `Flink Table Store`, developed from the Flink 
community. The architecture refers to some design concepts of Iceberg.
-Thanks to Apache Flink and Apache Iceberg.
+`Paimon`'s former name was `Flink Table Store`, developed from the Flink 
community. The architecture refers to some 
+design concepts of Iceberg. Thanks to Apache Flink and Apache Iceberg.
 
 ## Collaboration
 
@@ -64,8 +66,6 @@ You can join the Paimon community on Slack. Paimon channel is 
in ASF Slack works
 - If you don't have an @apache.org email address, you can email to 
`[email protected]` to apply for an
   [ASF Slack invitation](https://infra.apache.org/slack.html). Then join 
[Paimon channel](https://the-asf.slack.com/archives/C053Q2NCW8G).
 
-Don’t forget to introduce yourself in channel.
-
 ## Building
 
 JDK 8/11 is required for building the project.
diff --git a/docs/content/_index.md b/docs/content/_index.md
index d00771e0a..160dc4f8a 100644
--- a/docs/content/_index.md
+++ b/docs/content/_index.md
@@ -24,15 +24,22 @@ under the License.
 
 # Apache Paimon
 
-Apache Paimon(incubating) is a streaming data lake platform that supports 
high-speed data ingestion, change data tracking and efficient real-time 
analytics.
+Apache Paimon(incubating) is a lake format that enables building a Realtime 
Lakehouse Architecture with Flink and Spark 
+for both streaming and batch operations. Paimon innovatively combines lake 
format and LSM (Log-structured merge-tree) 
+structure, bringing realtime streaming updates into the lake architecture.
 
 Paimon offers the following core capabilities:
 
-- Unified Batch & Streaming: Paimon supports batch write and batch read, as 
well as streaming write changes and streaming read table changelogs.
-- Data Lake: As a data lake storage, Paimon has the following advantages: low 
cost, high reliability, and scalable metadata.
-- Merge Engines: Paimon supports rich Merge Engines. By default, the last 
entry of the primary key is reserved. You can also use the "partial-update" or 
"aggregation" engine.
-- Changelog producer: Paimon supports rich Changelog producers, such as 
"lookup" and "full-compaction". The correct changelog can simplify the 
construction of a streaming pipeline.
-- Append Only Tables: Paimon supports Append Only tables, automatically 
compact small files, and provides orderly stream reading. You can use this to 
replace message queues.
+- Realtime updates:
+  - Primary key table supports writing of large-scale updates, has very high 
update performance, typically through Flink Streaming.
+  - Support defining Merge Engines, update records however you like. 
Deduplicate to keep last row, or partial-update, or aggregate records, or 
first-row, you decide.
+  - Support defining changelog-producer, produce correct and complete 
changelog in updates for merge engines, simplifying your streaming analytics.
+- Huge Append Data Processing:
+  - Append table (no primary-key) provides large scale batch & streaming 
processing capability. Automatic Small File Merge.
+  - Supports Data Compaction with z-order sorting to optimize file layout, 
provides fast queries based on data skipping using indexes such as minmax.
+- Data Lake Capabilities: 
+  - Scalable metadata: supports storing Petabyte large-scale datasets and 
storing a large number of partitions.
+  - Supports ACID Transactions & Time Travel & Schema Evolution.
 
 {{< columns >}}
 
diff --git a/docs/content/concepts/basic-concepts.md 
b/docs/content/concepts/basic-concepts.md
index ab6cd6685..0c57e2e34 100644
--- a/docs/content/concepts/basic-concepts.md
+++ b/docs/content/concepts/basic-concepts.md
@@ -26,34 +26,50 @@ under the License.
 
 # Basic Concepts
 
+## File Layouts
+
+All files of a table are stored under one base directory. Paimon files are 
organized in a layered style. The following image illustrates the file layout. 
Starting from a snapshot file, Paimon readers can recursively access all 
records from the table.
+
+{{< img src="/img/file-layout.png">}}
+
 ## Snapshot
 
-A snapshot captures the state of a table at some point in time. Users can 
access the latest data of a table through the latest snapshot. By time 
traveling, users can also access the previous state of a table through an 
earlier snapshot.
+All snapshot files are stored in the `snapshot` directory.
 
-## Partition
+A snapshot file is a JSON file containing information about this snapshot, 
including
 
-Paimon adopts the same partitioning concept as Apache Hive to separate data.
+* the schema file in use
+* the manifest list containing all changes of this snapshot
 
-Partitioning is an optional way of dividing a table into related parts based 
on the values of particular columns like date, city, and department. Each table 
can have one or more partition keys to identify a particular partition.
+A snapshot captures the state of a table at some point in time. Users can 
access the latest data of a table through the
+latest snapshot. By time traveling, users can also access the previous state 
of a table through an earlier snapshot.
 
-By partitioning, users can efficiently operate on a slice of records in the 
table. See [file layouts]({{< ref "concepts/file-layouts" >}}) for how files 
are divided into multiple partitions.
+## Manifest Files
 
-{{< hint info >}}
-If you need cross partition upsert (primary keys not contain all partition 
fields), see [Cross partition Upsert]({{< ref 
"concepts/primary-key-table/data-distribution#cross-partitions-upsert-dynamic-bucket-mode">}})
 mode.
-{{< /hint >}}
+All manifest lists and manifest files are stored in the `manifest` directory.
 
-## Bucket
+A manifest list is a list of manifest file names.
 
-Unpartitioned tables, or partitions in partitioned tables, are sub-divided 
into buckets, to provide extra structure to the data that may be used for more 
efficient querying.
+A manifest file is a file containing changes about LSM data files and 
changelog files. For example, which LSM data file is created and which file is 
deleted in the corresponding snapshot.
 
-The range for a bucket is determined by the hash value of one or more columns 
in the records. Users can specify bucketing columns by providing the 
[`bucket-key` option]({{< ref "maintenance/configurations#coreoptions" >}}). If 
no `bucket-key` option is specified, the primary key (if defined) or the 
complete record will be used as the bucket key.
+## Data Files
 
-A bucket is the smallest storage unit for reads and writes, so the number of 
buckets limits the maximum processing parallelism. This number should not be 
too big, though, as it will result in lots of small files and low read 
performance. In general, the recommended data size in each bucket is about 
200MB - 1GB.
+Data files are grouped by partitions. Currently, Paimon supports using orc 
(default), parquet and avro as data file's format.
+
+## Partition
+
+Paimon adopts the same partitioning concept as Apache Hive to separate data.
+
+Partitioning is an optional way of dividing a table into related parts based 
on the values of particular columns like date, city, and department. Each table 
can have one or more partition keys to identify a particular partition.
 
-See [file layouts]({{< ref "concepts/file-layouts" >}}) for how files are 
divided into buckets. Also, see [rescale bucket]({{< ref 
"maintenance/rescale-bucket" >}}) if you want to adjust the number of buckets 
after a table is created.
+By partitioning, users can efficiently operate on a slice of records in the 
table.
 
 ## Consistency Guarantees
 
-Paimon writers use two-phase commit protocol to atomically commit a batch of 
records to the table. Each commit produces at most two [snapshots]({{< ref 
"concepts/basic-concepts#snapshot" >}}) at commit time.
+Paimon writers use two-phase commit protocol to atomically commit a batch of 
records to the table. Each commit produces
+at most two [snapshots]({{< ref "concepts/basic-concepts#snapshot" >}}) at 
commit time.
 
-For any two writers modifying a table at the same time, as long as they do not 
modify the same bucket, their commits can occur in parallel. If they modify the 
same bucket, only snapshot isolation is guaranteed. That is, the final table 
state may be a mix of the two commits, but no changes are lost.
+For any two writers modifying a table at the same time, as long as they do not 
modify the same partition, their commits 
+can occur in parallel. If they modify the same partition, only snapshot 
isolation is guaranteed. That is, the final table 
+state may be a mix of the two commits, but no changes are lost.
+See [dedicated compaction job]({{< ref 
"maintenance/dedicated-compaction#dedicated-compaction-job" >}}) for more info.
diff --git a/docs/content/concepts/file-layouts.md 
b/docs/content/concepts/file-layouts.md
deleted file mode 100644
index 1b568e032..000000000
--- a/docs/content/concepts/file-layouts.md
+++ /dev/null
@@ -1,80 +0,0 @@
----
-title: "File Layouts"
-weight: 3
-type: docs
-aliases:
-- /concepts/file-layouts.html
----
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-# File Layouts
-
-All files of a table are stored under one base directory. Paimon files are 
organized in a layered style. The following image illustrates the file layout. 
Starting from a snapshot file, Paimon readers can recursively access all 
records from the table.
-
-{{< img src="/img/file-layout.png">}}
-
-## Snapshot Files
-
-All snapshot files are stored in the `snapshot` directory.
-
-A snapshot file is a JSON file containing information about this snapshot, 
including
-
-* the schema file in use
-* the manifest list containing all changes of this snapshot
-
-## Manifest Files
-
-All manifest lists and manifest files are stored in the `manifest` directory.
-
-A manifest list is a list of manifest file names.
-
-A manifest file is a file containing changes about LSM data files and 
changelog files. For example, which LSM data file is created and which file is 
deleted in the corresponding snapshot.
-
-## Data Files
-
-Data files are grouped by partitions and buckets. Each bucket directory 
contains an [LSM tree]({{< ref "concepts/file-layouts#lsm-trees" >}}) and its 
[changelog files]({{< ref "concepts/primary-key-table/changelog-producer" >}}).
-
-Currently, Paimon supports using orc(default), parquet and avro as data file's 
format.
-
-## LSM Trees
-
-Paimon adapts the LSM tree (log-structured merge-tree) as the data structure 
for file storage. This documentation briefly introduces the concepts about LSM 
trees.
-
-### Sorted Runs
-
-LSM tree organizes files into several sorted runs. A sorted run consists of 
one or multiple [data files]({{< ref "concepts/file-layouts#data-files" >}}) 
and each data file belongs to exactly one sorted run.
-
-Records within a data file are sorted by their primary keys. Within a sorted 
run, ranges of primary keys of data files never overlap.
-
-{{< img src="/img/sorted-runs.png">}}
-
-As you can see, different sorted runs may have overlapping primary key ranges, 
and may even contain the same primary key. When querying the LSM tree, all 
sorted runs must be combined and all records with the same primary key must be 
merged according to the user-specified [merge engine]({{< ref 
"concepts/primary-key-table/merge-engine" >}}) and the timestamp of each record.
-
-New records written into the LSM tree will be first buffered in memory. When 
the memory buffer is full, all records in memory will be sorted and flushed to 
disk. A new sorted run is now created.
-
-### Compaction
-
-When more and more records are written into the LSM tree, the number of sorted 
runs will increase. Because querying an LSM tree requires all sorted runs to be 
combined, too many sorted runs will result in a poor query performance, or even 
out of memory.
-
-To limit the number of sorted runs, we have to merge several sorted runs into 
one big sorted run once in a while. This procedure is called compaction.
-
-However, compaction is a resource intensive procedure which consumes a certain 
amount of CPU time and disk IO, so too frequent compaction may in turn result 
in slower writes. It is a trade-off between query and write performance. Paimon 
currently adapts a compaction strategy similar to Rocksdb's [universal 
compaction](https://github.com/facebook/rocksdb/wiki/Universal-Compaction).
-
-By default, when Paimon appends records to the LSM tree, it will also perform 
compactions as needed. Users can also choose to perform all compactions in a 
dedicated compaction job. See [dedicated compaction job]({{< ref 
"maintenance/dedicated-compaction#dedicated-compaction-job" >}}) for more info.
diff --git a/docs/content/concepts/overview.md 
b/docs/content/concepts/overview.md
index 9dce249f7..269107861 100644
--- a/docs/content/concepts/overview.md
+++ b/docs/content/concepts/overview.md
@@ -26,9 +26,7 @@ under the License.
 
 # Overview
 
-Apache Paimon(incubating) is a streaming data lake platform that supports 
high-speed data ingestion, change data tracking and efficient real-time 
analytics.
-
-## Architecture
+Apache Paimon(incubating)'s Architecture:
 
 {{< img src="/img/architecture.png">}}
 
@@ -39,14 +37,17 @@ As shown in the architecture above:
   - from historical snapshots (in batch mode),
   - from the latest offset (in streaming mode), or 
   - reading incremental snapshots in a hybrid way.
-- For writes, it supports streaming synchronization from the changelog of 
databases (CDC) or batch
-  insert/overwrite from offline data.
+- For writes, it supports
+  - streaming synchronization from the changelog of databases (CDC)
+  - batch insert/overwrite from offline data.
 
 **Ecosystem:** In addition to Apache Flink, Paimon also supports read by other 
computation
 engines like Apache Hive, Apache Spark and Trino.
 
-**Internal:** Under the hood, Paimon stores the columnar files on the 
filesystem/object-store and uses
-the LSM tree structure to support a large volume of data updates and 
high-performance queries.
+**Internal:**
+- Under the hood, Paimon stores the columnar files on the 
filesystem/object-store
+- The metadata of the file is saved in the manifest file, providing 
large-scale storage and data skipping.
+- For primary key table, uses the LSM tree structure to support a large volume 
of data updates and high-performance queries.
 
 ## Unified Storage
 
diff --git a/docs/content/concepts/primary-key-table/changelog-producer.md 
b/docs/content/concepts/primary-key-table/changelog-producer.md
index 0853e73bd..f0b132e5b 100644
--- a/docs/content/concepts/primary-key-table/changelog-producer.md
+++ b/docs/content/concepts/primary-key-table/changelog-producer.md
@@ -52,7 +52,7 @@ will be very costly and should be avoided. (You can force 
removing "normalize" o
 
 ## Input
 
-By specifying `'changelog-producer' = 'input'`, Paimon writers rely on their 
inputs as a source of complete changelog. All input records will be saved in 
separated [changelog files]({{< ref "concepts/file-layouts" >}}) and will be 
given to the consumers by Paimon sources.
+By specifying `'changelog-producer' = 'input'`, Paimon writers rely on their 
inputs as a source of complete changelog. All input records will be saved in 
separated changelog files and will be given to the consumers by Paimon sources.
 
 `input` changelog producer can be used when Paimon writers' inputs are 
complete changelog, such as from a database CDC, or generated by Flink stateful 
computation.
 
diff --git a/docs/content/concepts/primary-key-table/data-distribution.md 
b/docs/content/concepts/primary-key-table/data-distribution.md
index 55d54b53d..e0ed5fd57 100644
--- a/docs/content/concepts/primary-key-table/data-distribution.md
+++ b/docs/content/concepts/primary-key-table/data-distribution.md
@@ -31,7 +31,7 @@ By default, Paimon table only has one bucket, which means it 
only provides singl
 Please configure the bucket strategy to your table.
 {{< /hint >}}
 
-A bucket is the smallest storage unit for reads and writes, each bucket 
directory contains an [LSM tree]({{< ref "concepts/file-layouts#lsm-trees" >}}).
+A bucket is the smallest storage unit for reads and writes, each bucket 
directory contains an [LSM tree]({{< ref 
"concepts/primary-key-table/overview#lsm-trees" >}}).
 
 ## Fixed Bucket
 
diff --git a/docs/content/concepts/primary-key-table/overview.md 
b/docs/content/concepts/primary-key-table/overview.md
index e7f379a48..bcd1c24c5 100644
--- a/docs/content/concepts/primary-key-table/overview.md
+++ b/docs/content/concepts/primary-key-table/overview.md
@@ -32,4 +32,40 @@ Primary keys consist of a set of columns that contain unique 
values for each rec
 sorting the primary key within each bucket, allowing users to achieve high 
performance by applying filtering conditions
 on the primary key. See [CREATE TABLE]({{< ref "how-to/creating-tables" >}}).
 
+## Bucket
 
+Unpartitioned tables, or partitions in partitioned tables, are sub-divided 
into buckets, to provide extra structure to the data that may be used for more 
efficient querying.
+
+Each bucket directory contains an LSM tree and its [changelog files]({{< ref 
"concepts/primary-key-table/changelog-producer" >}}).
+
+The range for a bucket is determined by the hash value of one or more columns 
in the records. Users can specify bucketing columns by providing the 
[`bucket-key` option]({{< ref "maintenance/configurations#coreoptions" >}}). If 
no `bucket-key` option is specified, the primary key (if defined) or the 
complete record will be used as the bucket key.
+
+A bucket is the smallest storage unit for reads and writes, so the number of 
buckets limits the maximum processing parallelism. This number should not be 
too big, though, as it will result in lots of small files and low read 
performance. In general, the recommended data size in each bucket is about 
200MB - 1GB.
+
+Also, see [rescale bucket]({{< ref "maintenance/rescale-bucket" >}}) if you 
want to adjust the number of buckets after a table is created.
+
+## LSM Trees
+
+Paimon adapts the LSM tree (log-structured merge-tree) as the data structure 
for file storage. This documentation briefly introduces the concepts about LSM 
trees.
+
+### Sorted Runs
+
+LSM tree organizes files into several sorted runs. A sorted run consists of 
one or multiple data files and each data file belongs to exactly one sorted run.
+
+Records within a data file are sorted by their primary keys. Within a sorted 
run, ranges of primary keys of data files never overlap.
+
+{{< img src="/img/sorted-runs.png">}}
+
+As you can see, different sorted runs may have overlapping primary key ranges, 
and may even contain the same primary key. When querying the LSM tree, all 
sorted runs must be combined and all records with the same primary key must be 
merged according to the user-specified [merge engine]({{< ref 
"concepts/primary-key-table/merge-engine" >}}) and the timestamp of each record.
+
+New records written into the LSM tree will be first buffered in memory. When 
the memory buffer is full, all records in memory will be sorted and flushed to 
disk. A new sorted run is now created.
+
+### Compaction
+
+When more and more records are written into the LSM tree, the number of sorted 
runs will increase. Because querying an LSM tree requires all sorted runs to be 
combined, too many sorted runs will result in a poor query performance, or even 
out of memory.
+
+To limit the number of sorted runs, we have to merge several sorted runs into 
one big sorted run once in a while. This procedure is called compaction.
+
+However, compaction is a resource intensive procedure which consumes a certain 
amount of CPU time and disk IO, so too frequent compaction may in turn result 
in slower writes. It is a trade-off between query and write performance. Paimon 
currently adapts a compaction strategy similar to Rocksdb's [universal 
compaction](https://github.com/facebook/rocksdb/wiki/Universal-Compaction).
+
+By default, when Paimon appends records to the LSM tree, it will also perform 
compactions as needed. Users can also choose to perform all compactions in a 
dedicated compaction job. See [dedicated compaction job]({{< ref 
"maintenance/dedicated-compaction#dedicated-compaction-job" >}}) for more info.
diff --git a/docs/content/learn-paimon/understand-files.md 
b/docs/content/learn-paimon/understand-files.md
index 88899aacf..0649b5092 100644
--- a/docs/content/learn-paimon/understand-files.md
+++ b/docs/content/learn-paimon/understand-files.md
@@ -40,7 +40,7 @@ Before delving further into this page, please ensure that you 
have read through
 following sections:
 
 1. [Basic Concepts]({{< ref "concepts/basic-concepts" >}}),
-2. [File Layouts]({{< ref "concepts/file-layouts" >}}) and
+2. [Primary Key Table]({{< ref "concepts/primary-key-table/overview" >}}) and
 3. How to use Paimon in [Flink]({{< ref "engines/flink" >}}).
 
 ## Understand File Operations
@@ -406,8 +406,6 @@ spilled files in writer to generate bigger files in DFS.
 
 ### Understand Snapshots
 
-Before delving further into this section, please ensure that you have read 
[File Layouts]({{< ref "concepts/file-layouts" >}}).
-
 {{< img src="/img/file-operations-3.png">}}
 
 Paimon maintains multiple versions of files, compaction and deletion of files 
are logical and do not actually
@@ -444,8 +442,8 @@ of buckets, otherwise there will be quite a few small files 
as well.
 
 ### Understand LSM for Primary Table
 
-LSM tree organizes files into several sorted runs. A sorted run consists of 
one or multiple
-[data files]({{< ref "concepts/file-layouts#data-files" >}}) and each data 
file belongs to exactly one sorted run.
+LSM tree organizes files into several sorted runs. A sorted run consists of 
one or multiple data files and each data
+file belongs to exactly one sorted run.
 
 {{< img src="/img/sorted-runs.png">}}
 
diff --git a/docs/content/maintenance/manage-snapshots.md 
b/docs/content/maintenance/manage-snapshots.md
index 67d447e93..661ac1064 100644
--- a/docs/content/maintenance/manage-snapshots.md
+++ b/docs/content/maintenance/manage-snapshots.md
@@ -30,7 +30,7 @@ This section will describe the management and behavior 
related to snapshots.
 
 ## Expire Snapshots
 
-Paimon writers generate one or two [snapshots]({{< ref 
"concepts/basic-concepts#snapshots" >}}) per commit. Each snapshot may add some 
new data files or mark some old data files as deleted. However, the marked data 
files are not truly deleted because Paimon also supports time traveling to an 
earlier snapshot. They are only deleted when the snapshot expires.
+Paimon writers generate one or two [snapshot]({{< ref 
"concepts/basic-concepts#snapshot" >}}) per commit. Each snapshot may add some 
new data files or mark some old data files as deleted. However, the marked data 
files are not truly deleted because Paimon also supports time traveling to an 
earlier snapshot. They are only deleted when the snapshot expires.
 
 Currently, expiration is automatically performed by Paimon writers when 
committing new changes. By expiring old snapshots, old data files and metadata 
files that are no longer used can be deleted to release disk space.
 
diff --git a/docs/content/maintenance/write-performance.md 
b/docs/content/maintenance/write-performance.md
index bfec08d01..1386e5858 100644
--- a/docs/content/maintenance/write-performance.md
+++ b/docs/content/maintenance/write-performance.md
@@ -143,9 +143,9 @@ Its value depends on your memory size.
 
 ### Number of Sorted Runs to Trigger Compaction
 
-Paimon uses [LSM tree]({{< ref "concepts/file-layouts#lsm-trees" >}}) which 
supports a large number of updates. LSM organizes files in several [sorted 
runs]({{< ref "concepts/file-layouts#sorted-runs" >}}). When querying records 
from an LSM tree, all sorted runs must be combined to produce a complete view 
of all records.
+Paimon uses [LSM tree]({{< ref "concepts/primary-key-table/overview#lsm-trees" 
>}}) which supports a large number of updates. LSM organizes files in several 
[sorted runs]({{< ref "concepts/primary-key-table/overview#sorted-runs" >}}). 
When querying records from an LSM tree, all sorted runs must be combined to 
produce a complete view of all records.
 
-One can easily see that too many sorted runs will result in poor query 
performance. To keep the number of sorted runs in a reasonable range, Paimon 
writers will automatically perform [compactions]({{< ref 
"concepts/file-layouts#compaction" >}}). The following table property 
determines the minimum number of sorted runs to trigger a compaction.
+One can easily see that too many sorted runs will result in poor query 
performance. To keep the number of sorted runs in a reasonable range, Paimon 
writers will automatically perform [compactions]({{< ref 
"concepts/primary-key-table/overview#compaction" >}}). The following table 
property determines the minimum number of sorted runs to trigger a compaction.
 
 <table class="table table-bordered">
     <thead>

(incubator-paimon) branch master updated: [doc] Update documentation to better structure

Reply via email to