This is an automated email from the ASF dual-hosted git repository.
JingsongLi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git
The following commit(s) were added to refs/heads/master by this push:
new e36264f9b3 [doc] Merge overview pages into category index to eliminate
empty landing pages
e36264f9b3 is described below
commit e36264f9b3114c9dc6293fd138ac429663d49719
Author: JingsongLi <[email protected]>
AuthorDate: Sat May 23 22:33:29 2026 +0800
[doc] Merge overview pages into category index to eliminate empty landing
pages
Previously, clicking a category title in the sidebar showed a blank page
because all index.md files were empty. This merges the overview content
into each category's index file so users see useful content immediately.
---
docs/docs/append-table/index.md | 23 -----
docs/docs/append-table/{overview.mdx => index.mdx} | 34 +------
docs/docs/cdc-ingestion/index.md | 23 -----
.../docs/cdc-ingestion/{overview.mdx => index.mdx} | 6 +-
docs/docs/concepts/catalog.md | 2 +-
docs/docs/concepts/index.md | 45 +++++++++
docs/docs/concepts/overview.md | 68 -------------
docs/docs/concepts/rest/index.md | 43 ++++++++
docs/docs/concepts/rest/overview.md | 66 ------------
docs/docs/concepts/rest/tables.mdx | 4 +-
docs/docs/concepts/spec/index.md | 59 +++++++++++
docs/docs/concepts/spec/overview.md | 82 ---------------
docs/docs/concepts/views.md | 2 +-
docs/docs/ecosystem/index.md | 56 +++++++++++
docs/docs/ecosystem/overview.md | 79 ---------------
docs/docs/flink/action-jars.md | 2 +-
docs/docs/flink/sql-write.mdx | 12 +--
docs/docs/iceberg/index.md | 88 ++++++++++++++++
docs/docs/iceberg/overview.md | 111 ---------------------
docs/docs/index.mdx | 12 +--
docs/docs/learn-paimon/scenario-guide.mdx | 8 +-
docs/docs/learn-paimon/understand-files.mdx | 2 +-
docs/docs/maintenance/dedicated-compaction.mdx | 2 +-
docs/docs/migration/migration-from-hive.mdx | 2 +-
docs/docs/primary-key-table/compaction.md | 2 +-
docs/docs/primary-key-table/data-distribution.md | 2 +-
docs/docs/primary-key-table/index.md | 36 +++++++
docs/docs/primary-key-table/merge-engine/index.md | 21 ++++
.../primary-key-table/merge-engine/overview.md | 44 --------
docs/docs/primary-key-table/overview.md | 59 -----------
docs/docs/program-api/rest-api.mdx | 2 +-
docs/docs/pypaimon/index.md | 29 ++++++
docs/docs/pypaimon/overview.md | 52 ----------
docs/docs/pypaimon/python-api.mdx | 2 +-
docs/sidebars.js | 19 ++--
35 files changed, 418 insertions(+), 681 deletions(-)
diff --git a/docs/docs/append-table/index.md b/docs/docs/append-table/index.md
deleted file mode 100644
index 1b47be4a90..0000000000
--- a/docs/docs/append-table/index.md
+++ /dev/null
@@ -1,23 +0,0 @@
----
-title: "Table w/o PK"
-sidebar_position: 3
----
-
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
diff --git a/docs/docs/append-table/overview.mdx
b/docs/docs/append-table/index.mdx
similarity index 88%
rename from docs/docs/append-table/overview.mdx
rename to docs/docs/append-table/index.mdx
index 59d3a785f4..2014eb336e 100644
--- a/docs/docs/append-table/overview.mdx
+++ b/docs/docs/append-table/index.mdx
@@ -1,6 +1,6 @@
---
-title: "Overview"
-sidebar_position: 1
+title: "Table w/o PK"
+sidebar_position: 3
---
import Tabs from '@theme/Tabs';
@@ -65,32 +65,6 @@ the Hive table, it can bring:
5. Incremental Clustering with z-order/hilbert/order sorting to optimize data
layout at low cost.
6. Streaming read & write like a queue, DELETE / UPDATE / MERGE INTO support
low-cost row-level operations.
----
-title: "Streaming"
-weight: 2
-type: docs
-aliases:
-- /append-table/streaming.html
----
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
## Append Streaming
You can stream write to the Append table in a very flexible way through Flink,
or read the Append table through
@@ -105,8 +79,8 @@ If Flink's checkpoint interval is short (for example, 30
seconds), each snapshot
files. Too many files may put a burden on the distributed storage cluster.
In order to compact small changelog files into large ones, you can set the
table option `precommit-compact = true`.
-Default value of this option is false, if true, it will add a compact
coordinator and worker operator after the writer
-operator, which copies changelog files into large ones.
+Default value of this option is false, if true, it will add a compact
coordinator and worker operator after the
+writer operator, which copies changelog files into large ones.
**Post small files merging**
diff --git a/docs/docs/cdc-ingestion/index.md b/docs/docs/cdc-ingestion/index.md
deleted file mode 100644
index 37cc4e086e..0000000000
--- a/docs/docs/cdc-ingestion/index.md
+++ /dev/null
@@ -1,23 +0,0 @@
----
-title: "CDC Ingestion"
-sidebar_position: 91
----
-
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
diff --git a/docs/docs/cdc-ingestion/overview.mdx
b/docs/docs/cdc-ingestion/index.mdx
similarity index 99%
rename from docs/docs/cdc-ingestion/overview.mdx
rename to docs/docs/cdc-ingestion/index.mdx
index 5ff105b656..d57be3992d 100644
--- a/docs/docs/cdc-ingestion/overview.mdx
+++ b/docs/docs/cdc-ingestion/index.mdx
@@ -1,6 +1,6 @@
---
-title: "Overview"
-sidebar_position: 1
+title: "CDC Ingestion"
+sidebar_position: 91
---
import ConfigTable from '@site/src/components/ConfigTable';
@@ -155,4 +155,4 @@ Use `-Dpipeline.name=<job-name>` to set custom
synchronization job name.
You can use `--table_conf` to set table properties and some flink job
properties (like `sink.parallelism`). If the table is
created by the cdc job, the table's properties will be equal to the given
properties. Otherwise, the job will use the given
-properties to alter table's properties. But note that immutable options (like
`merge-engine`) and bucket number won't be altered.
\ No newline at end of file
+properties to alter table's properties. But note that immutable options (like
`merge-engine`) and bucket number won't be altered.
diff --git a/docs/docs/concepts/catalog.md b/docs/docs/concepts/catalog.md
index 7d16131e1b..3eb9f2c135 100644
--- a/docs/docs/concepts/catalog.md
+++ b/docs/docs/concepts/catalog.md
@@ -52,7 +52,7 @@ CREATE CATALOG my_catalog WITH (
## REST Catalog
By using the Paimon REST catalog, changes to the catalog will be directly
stored in a remote catalog server which exposed through REST API.
-See [Paimon REST Catalog](./rest/overview).
+See [Paimon REST Catalog](./rest/).
## Hive Catalog
diff --git a/docs/docs/concepts/index.md b/docs/docs/concepts/index.md
index e550041e2d..b50cf35be6 100644
--- a/docs/docs/concepts/index.md
+++ b/docs/docs/concepts/index.md
@@ -21,3 +21,48 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+# Overview
+
+Apache Paimon's Architecture:
+
+
+
+As shown in the architecture above:
+
+**Read/Write:** Paimon supports a versatile way to read/write data and perform
OLAP queries.
+- For reads, it supports consuming data
+ - from historical snapshots (in batch mode),
+ - from the latest offset (in streaming mode), or
+ - reading incremental snapshots in a hybrid way.
+- For writes, it supports
+ - streaming synchronization from the changelog of databases (CDC)
+ - batch insert/overwrite from offline data.
+
+**Ecosystem:** In addition to Apache Flink, Paimon also supports read by other
computation
+engines like Apache Spark, StarRocks, Apache Doris, Apache Hive and Trino.
+
+**Internal:**
+- Under the hood, Paimon stores the columnar files on the
filesystem/object-store
+- The metadata of the file is saved in the manifest file, providing
large-scale storage and data skipping.
+- For primary key table, uses the LSM tree structure to support a large volume
of data updates and high-performance queries.
+
+## Unified Storage
+
+For streaming engines like Apache Flink, there are typically three types of
connectors:
+- Message queue, such as Apache Kafka, it is used in both source and
+ intermediate stages in this pipeline, to guarantee the latency stay
+ within seconds.
+- OLAP system, such as ClickHouse, it receives processed data in
+ streaming fashion and serving user's ad-hoc queries.
+- Batch storage, such as Apache Hive, it supports various operations
+ of the traditional batch processing, including `INSERT OVERWRITE`.
+
+Paimon provides table abstraction. It is used in a way that
+does not differ from the traditional database:
+- In `batch` execution mode, it acts like a Hive table and
+ supports various operations of Batch SQL. Query it to see the
+ latest snapshot.
+- In `streaming` execution mode, it acts like a message queue.
+ Query it acts like querying a stream changelog from a message queue
+ where historical data never expires.
diff --git a/docs/docs/concepts/overview.md b/docs/docs/concepts/overview.md
deleted file mode 100644
index 8eaa6f558d..0000000000
--- a/docs/docs/concepts/overview.md
+++ /dev/null
@@ -1,68 +0,0 @@
----
-title: "Overview"
-sidebar_position: 1
----
-
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-# Overview
-
-Apache Paimon's Architecture:
-
-
-
-As shown in the architecture above:
-
-**Read/Write:** Paimon supports a versatile way to read/write data and perform
OLAP queries.
-- For reads, it supports consuming data
- - from historical snapshots (in batch mode),
- - from the latest offset (in streaming mode), or
- - reading incremental snapshots in a hybrid way.
-- For writes, it supports
- - streaming synchronization from the changelog of databases (CDC)
- - batch insert/overwrite from offline data.
-
-**Ecosystem:** In addition to Apache Flink, Paimon also supports read by other
computation
-engines like Apache Spark, StarRocks, Apache Doris, Apache Hive and Trino.
-
-**Internal:**
-- Under the hood, Paimon stores the columnar files on the
filesystem/object-store
-- The metadata of the file is saved in the manifest file, providing
large-scale storage and data skipping.
-- For primary key table, uses the LSM tree structure to support a large volume
of data updates and high-performance queries.
-
-## Unified Storage
-
-For streaming engines like Apache Flink, there are typically three types of
connectors:
-- Message queue, such as Apache Kafka, it is used in both source and
- intermediate stages in this pipeline, to guarantee the latency stay
- within seconds.
-- OLAP system, such as ClickHouse, it receives processed data in
- streaming fashion and serving user's ad-hoc queries.
-- Batch storage, such as Apache Hive, it supports various operations
- of the traditional batch processing, including `INSERT OVERWRITE`.
-
-Paimon provides table abstraction. It is used in a way that
-does not differ from the traditional database:
-- In `batch` execution mode, it acts like a Hive table and
- supports various operations of Batch SQL. Query it to see the
- latest snapshot.
-- In `streaming` execution mode, it acts like a message queue.
- Query it acts like querying a stream changelog from a message queue
- where historical data never expires.
diff --git a/docs/docs/concepts/rest/index.md b/docs/docs/concepts/rest/index.md
index 6d2ecc656b..bd8ada4b67 100644
--- a/docs/docs/concepts/rest/index.md
+++ b/docs/docs/concepts/rest/index.md
@@ -21,3 +21,46 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+# RESTCatalog
+
+## Overview
+
+Paimon REST Catalog provides a lightweight implementation to access the
catalog service. Paimon could access the
+catalog service through a catalog server which implements REST API. You can
see all APIs in [REST API](./rest-api).
+
+
+
+## Key Features
+
+1. User Defined Technology-Specific Logic Implementation
+ - All technology-specific logic within the catalog server.
+ - This ensures that the user can define logic that could be owned by the
user.
+2. Decoupled Architecture
+ - The REST Catalog interacts with the catalog server through a
well-defined REST API.
+ - This decoupling allows for independent evolution and scaling of the
catalog server and clients.
+3. Language Agnostic
+ - Developers can implement the catalog server in any programming language,
provided that it adheres to the specified REST API.
+ - This flexibility enables teams to utilize their existing tech stacks and
expertise.
+4. Support for Any Catalog Backend
+ - REST Catalog is designed to work with any catalog backend.
+ - As long as they implement the relevant APIs, they can seamlessly
integrate with REST Catalog.
+
+## Conclusion
+
+REST Catalog offers adaptable solution for accessing the catalog service.
According to [REST API](./rest-api) is decoupled
+from the catalog service.
+
+Technology-specific Logic is encapsulated on the catalog server. At the same
time, the catalog server supports any
+backend and languages.
+
+## Token Provider
+
+RESTCatalog supports multiple access authentication methods, including the
following:
+
+1. [Bear Token](./bear).
+2. [DLF Token](./dlf).
+
+## REST Open API
+
+See [REST API](./rest-api).
diff --git a/docs/docs/concepts/rest/overview.md
b/docs/docs/concepts/rest/overview.md
deleted file mode 100644
index d73863a2d3..0000000000
--- a/docs/docs/concepts/rest/overview.md
+++ /dev/null
@@ -1,66 +0,0 @@
----
-title: "Overview"
-sidebar_position: 1
----
-
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-# RESTCatalog
-
-## Overview
-
-Paimon REST Catalog provides a lightweight implementation to access the
catalog service. Paimon could access the
-catalog service through a catalog server which implements REST API. You can
see all APIs in [REST API](./rest-api).
-
-
-
-## Key Features
-
-1. User Defined Technology-Specific Logic Implementation
- - All technology-specific logic within the catalog server.
- - This ensures that the user can define logic that could be owned by the
user.
-2. Decoupled Architecture
- - The REST Catalog interacts with the catalog server through a
well-defined REST API.
- - This decoupling allows for independent evolution and scaling of the
catalog server and clients.
-3. Language Agnostic
- - Developers can implement the catalog server in any programming language,
provided that it adheres to the specified REST API.
- - This flexibility enables teams to utilize their existing tech stacks and
expertise.
-4. Support for Any Catalog Backend
- - REST Catalog is designed to work with any catalog backend.
- - As long as they implement the relevant APIs, they can seamlessly
integrate with REST Catalog.
-
-## Conclusion
-
-REST Catalog offers adaptable solution for accessing the catalog service.
According to [REST API](./rest-api) is decoupled
-from the catalog service.
-
-Technology-specific Logic is encapsulated on the catalog server. At the same
time, the catalog server supports any
-backend and languages.
-
-## Token Provider
-
-RESTCatalog supports multiple access authentication methods, including the
following:
-
-1. [Bear Token](./bear).
-2. [DLF Token](./dlf).
-
-## REST Open API
-
-See [REST API](./rest-api).
diff --git a/docs/docs/concepts/rest/tables.mdx
b/docs/docs/concepts/rest/tables.mdx
index 508b715f25..87b6d1c059 100644
--- a/docs/docs/concepts/rest/tables.mdx
+++ b/docs/docs/concepts/rest/tables.mdx
@@ -38,7 +38,7 @@ Paimon supports tables:
### Primary Key Table
-See [Paimon with Primary key](../../primary-key-table/overview).
+See [Paimon with Primary key](../../primary-key-table/).
Primary keys consist of a set of columns that contain unique values for each
record. Paimon enforces data ordering by
sorting the primary key within each bucket, allowing streaming update and
streaming changelog read.
@@ -79,7 +79,7 @@ CREATE TABLE my_table (
### Append Table
-See [Append Table](../../append-table/overview).
+See [Append Table](../../append-table/).
If a table does not have a primary key defined, it is an append table.
Compared to the primary key table, it does not
have the ability to directly receive changelogs. It cannot be directly updated
with data through streaming upsert. It
diff --git a/docs/docs/concepts/spec/index.md b/docs/docs/concepts/spec/index.md
index 746a635d1f..9bac437d98 100644
--- a/docs/docs/concepts/spec/index.md
+++ b/docs/docs/concepts/spec/index.md
@@ -21,3 +21,62 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+# Spec Overview
+
+This is the specification for the Paimon table format, this document
standardizes the underlying file structure and
+design of Paimon.
+
+
+
+## Terms
+
+- Schema: fields, primary keys definition, partition keys definition and
options.
+- Snapshot: the entrance to all data committed at some specific time point.
+- Manifest list: includes several manifest files.
+- Manifest: includes several data files or changelog files.
+- Data File: contains incremental records.
+- Changelog File: contains records produced by changelog-producer.
+- Global Index: index for a bucket or partition.
+- Data File Index: index for a data file.
+
+Run Flink SQL with Paimon:
+
+```sql
+CREATE CATALOG my_catalog WITH (
+ 'type' = 'paimon',
+ 'warehouse' = '/your/path'
+);
+USE CATALOG my_catalog;
+
+CREATE TABLE my_table (
+ k INT PRIMARY KEY NOT ENFORCED,
+ f0 INT,
+ f1 STRING
+);
+
+INSERT INTO my_table VALUES (1, 11, '111');
+```
+
+Take a look to the disk:
+
+```shell
+warehouse
+└── default.db
+ └── my_table
+ ├── bucket-0
+ │ └── data-59f60cb9-44af-48cc-b5ad-59e85c663c8f-0.orc
+ ├── index
+ │ └── index-5625e6d9-dd44-403b-a738-2b6ea92e20f1-0
+ ├── manifest
+ │ ├── index-manifest-5d670043-da25-4265-9a26-e31affc98039-0
+ │ ├── manifest-6758823b-2010-4d06-aef0-3b1b597723d6-0
+ │ ├── manifest-list-9f856d52-5b33-4c10-8933-a0eddfaa25bf-0
+ │ └── manifest-list-9f856d52-5b33-4c10-8933-a0eddfaa25bf-1
+ ├── schema
+ │ └── schema-0
+ └── snapshot
+ ├── EARLIEST
+ ├── LATEST
+ └── snapshot-1
+```
diff --git a/docs/docs/concepts/spec/overview.md
b/docs/docs/concepts/spec/overview.md
deleted file mode 100644
index 346e02759b..0000000000
--- a/docs/docs/concepts/spec/overview.md
+++ /dev/null
@@ -1,82 +0,0 @@
----
-title: "Overview"
-sidebar_position: 1
----
-
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-# Spec Overview
-
-This is the specification for the Paimon table format, this document
standardizes the underlying file structure and
-design of Paimon.
-
-
-
-## Terms
-
-- Schema: fields, primary keys definition, partition keys definition and
options.
-- Snapshot: the entrance to all data committed at some specific time point.
-- Manifest list: includes several manifest files.
-- Manifest: includes several data files or changelog files.
-- Data File: contains incremental records.
-- Changelog File: contains records produced by changelog-producer.
-- Global Index: index for a bucket or partition.
-- Data File Index: index for a data file.
-
-Run Flink SQL with Paimon:
-
-```sql
-CREATE CATALOG my_catalog WITH (
- 'type' = 'paimon',
- 'warehouse' = '/your/path'
-);
-USE CATALOG my_catalog;
-
-CREATE TABLE my_table (
- k INT PRIMARY KEY NOT ENFORCED,
- f0 INT,
- f1 STRING
-);
-
-INSERT INTO my_table VALUES (1, 11, '111');
-```
-
-Take a look to the disk:
-
-```shell
-warehouse
-└── default.db
- └── my_table
- ├── bucket-0
- │ └── data-59f60cb9-44af-48cc-b5ad-59e85c663c8f-0.orc
- ├── index
- │ └── index-5625e6d9-dd44-403b-a738-2b6ea92e20f1-0
- ├── manifest
- │ ├── index-manifest-5d670043-da25-4265-9a26-e31affc98039-0
- │ ├── manifest-6758823b-2010-4d06-aef0-3b1b597723d6-0
- │ ├── manifest-list-9f856d52-5b33-4c10-8933-a0eddfaa25bf-0
- │ └── manifest-list-9f856d52-5b33-4c10-8933-a0eddfaa25bf-1
- ├── schema
- │ └── schema-0
- └── snapshot
- ├── EARLIEST
- ├── LATEST
- └── snapshot-1
-```
diff --git a/docs/docs/concepts/views.md b/docs/docs/concepts/views.md
index 171863ae65..8f0559fde9 100644
--- a/docs/docs/concepts/views.md
+++ b/docs/docs/concepts/views.md
@@ -102,5 +102,5 @@ CALL sys.alter_view_dialect('view_identifier', 'drop',
'spark');
## See also
- [Spark SQL DDL – Views](../spark/sql-ddl#view)
-- [REST Catalog Overview](./rest/overview)
+- [REST Catalog Overview](./rest/)
- [REST Catalog View API](./rest/rest-api)
diff --git a/docs/docs/ecosystem/index.md b/docs/docs/ecosystem/index.md
index 7962021125..1e00fde741 100644
--- a/docs/docs/ecosystem/index.md
+++ b/docs/docs/ecosystem/index.md
@@ -21,3 +21,59 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+# Overview
+
+## Compatibility Matrix
+
+| Engine
| Version | Batch Read | Batch Write | Create Table | Alter Table
| Streaming Write | Streaming Read | Batch Overwrite | DELETE & UPDATE |
MERGE INTO | Time Travel |
+|:-------------------------------------------------------------------------------:|:-------------:|:-----------:|:-----------:|:-------------:|:-------------:|:----------------:|:----------------:|:---------------:|:---------------:|:----------:|:-----------:|
+| Flink
| 1.16 - 1.20 | ✅ | ✅ | ✅ | ✅(1.17+) |
✅ | ✅ | ✅ | ✅(1.17+) | ❌
| ✅ |
+| Spark
| 3.2 - 4.0 | ✅ | ✅ | ✅ | ✅ |
✅(3.3+) | ✅(3.3+) | ✅ | ✅ | ✅
| ✅(3.3+) |
+| Hive
| 2.1 - 3.1 | ✅ | ✅ | ✅ | ❌ |
❌ | ❌ | ❌ | ❌ | ❌
| ✅ |
+| Trino
| 420 - 440 | ✅ | ✅(427+) | ✅(427+) | ✅(427+) |
❌ | ❌ | ❌ | ❌ | ❌
| ✅ |
+| Presto
| 0.236 - 0.280 | ✅ | ❌ | ✅ | ✅ |
❌ | ❌ | ❌ | ❌ | ❌
| ❌ |
+|
[StarRocks](https://docs.starrocks.io/docs/data_source/catalog/paimon_catalog/)
| 3.1+ | ✅ | ❌ | ❌ | ❌ |
❌ | ❌ | ❌ | ❌ | ❌ |
✅ |
+| [Doris](https://doris.apache.org/docs/dev/lakehouse/catalogs/paimon-catalog)
| 2.0.6+ | ✅ | ❌ | ❌ | ❌ |
❌ | ❌ | ❌ | ❌ | ❌
| ✅ |
+
+## Streaming Engines
+
+### Flink Streaming
+
+Flink is the most comprehensive streaming computing engine that is widely used
for data CDC ingestion and the
+construction of streaming pipelines.
+
+Recommended version is Flink 1.17.2.
+
+### Spark Streaming
+
+You can also use Spark Streaming to build a streaming pipeline. Spark's schema
evolution capability will be better
+implemented, but you must accept the mechanism of mini-batch.
+
+## Batch Engines
+
+### Spark Batch
+
+Spark Batch is the most widely used batch computing engine.
+
+Recommended version is Spark 3.5.8.
+
+### Flink Batch
+
+Flink Batch is also available, which can make your pipeline more integrated
with streaming and batch unified.
+
+## OLAP Engines
+
+### StarRocks
+
+StarRocks is the most recommended OLAP engine with the most advanced
integration.
+
+Recommended version is StarRocks 3.2.6.
+
+### Other OLAP
+
+You can also use Doris and Trino and Presto, or, you can just use Spark, Flink
and Hive to query Paimon tables.
+
+## Download
+
+[Download Link](../project/download#engine-jars)
diff --git a/docs/docs/ecosystem/overview.md b/docs/docs/ecosystem/overview.md
deleted file mode 100644
index ada5e1f448..0000000000
--- a/docs/docs/ecosystem/overview.md
+++ /dev/null
@@ -1,79 +0,0 @@
----
-title: "Overview"
-sidebar_position: 1
----
-
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-# Overview
-
-## Compatibility Matrix
-
-| Engine
| Version | Batch Read | Batch Write | Create Table | Alter Table
| Streaming Write | Streaming Read | Batch Overwrite | DELETE & UPDATE |
MERGE INTO | Time Travel |
-|:-------------------------------------------------------------------------------:|:-------------:|:-----------:|:-----------:|:-------------:|:-------------:|:----------------:|:----------------:|:---------------:|:---------------:|:----------:|:-----------:|
-| Flink
| 1.16 - 1.20 | ✅ | ✅ | ✅ | ✅(1.17+) |
✅ | ✅ | ✅ | ✅(1.17+) | ❌
| ✅ |
-| Spark
| 3.2 - 4.0 | ✅ | ✅ | ✅ | ✅ |
✅(3.3+) | ✅(3.3+) | ✅ | ✅ | ✅
| ✅(3.3+) |
-| Hive
| 2.1 - 3.1 | ✅ | ✅ | ✅ | ❌ |
❌ | ❌ | ❌ | ❌ | ❌
| ✅ |
-| Trino
| 420 - 440 | ✅ | ✅(427+) | ✅(427+) | ✅(427+) |
❌ | ❌ | ❌ | ❌ | ❌
| ✅ |
-| Presto
| 0.236 - 0.280 | ✅ | ❌ | ✅ | ✅ |
❌ | ❌ | ❌ | ❌ | ❌
| ❌ |
-|
[StarRocks](https://docs.starrocks.io/docs/data_source/catalog/paimon_catalog/)
| 3.1+ | ✅ | ❌ | ❌ | ❌ |
❌ | ❌ | ❌ | ❌ | ❌ |
✅ |
-| [Doris](https://doris.apache.org/docs/dev/lakehouse/catalogs/paimon-catalog)
| 2.0.6+ | ✅ | ❌ | ❌ | ❌ |
❌ | ❌ | ❌ | ❌ | ❌
| ✅ |
-
-## Streaming Engines
-
-### Flink Streaming
-
-Flink is the most comprehensive streaming computing engine that is widely used
for data CDC ingestion and the
-construction of streaming pipelines.
-
-Recommended version is Flink 1.17.2.
-
-### Spark Streaming
-
-You can also use Spark Streaming to build a streaming pipeline. Spark's schema
evolution capability will be better
-implemented, but you must accept the mechanism of mini-batch.
-
-## Batch Engines
-
-### Spark Batch
-
-Spark Batch is the most widely used batch computing engine.
-
-Recommended version is Spark 3.5.8.
-
-### Flink Batch
-
-Flink Batch is also available, which can make your pipeline more integrated
with streaming and batch unified.
-
-## OLAP Engines
-
-### StarRocks
-
-StarRocks is the most recommended OLAP engine with the most advanced
integration.
-
-Recommended version is StarRocks 3.2.6.
-
-### Other OLAP
-
-You can also use Doris and Trino and Presto, or, you can just use Spark, Flink
and Hive to query Paimon tables.
-
-## Download
-
-[Download Link](../project/download#engine-jars)
diff --git a/docs/docs/flink/action-jars.md b/docs/docs/flink/action-jars.md
index 3c71dc727b..b2c3e5a73e 100644
--- a/docs/docs/flink/action-jars.md
+++ b/docs/docs/flink/action-jars.md
@@ -49,7 +49,7 @@ Paimon supports "MERGE INTO" via submitting the 'merge_into'
job through `flink
:::info
Important table properties setting:
-1. Only [primary key table](../primary-key-table/overview) supports this
feature.
+1. Only [primary key table](../primary-key-table/) supports this feature.
2. The action won't produce UPDATE_BEFORE, so it's not recommended to set
'changelog-producer' = 'input'.
:::
diff --git a/docs/docs/flink/sql-write.mdx b/docs/docs/flink/sql-write.mdx
index c34211f8de..33114d00a1 100644
--- a/docs/docs/flink/sql-write.mdx
+++ b/docs/docs/flink/sql-write.mdx
@@ -52,9 +52,9 @@ For multiple jobs to write the same table, you can refer to
[dedicated compactio
### Clustering
-In Paimon, clustering is a feature that allows you to cluster data in your
[Append Table](../append-table/overview)
+In Paimon, clustering is a feature that allows you to cluster data in your
[Append Table](../append-table/)
based on the values of certain columns during the write process. This
organization of data can significantly enhance the efficiency of downstream
-tasks when reading the data, as it enables faster and more targeted data
retrieval. This feature is only supported for [Append
Table](../append-table/overview)(bucket = -1)
+tasks when reading the data, as it enables faster and more targeted data
retrieval. This feature is only supported for [Append
Table](../append-table/)(bucket = -1)
and batch execution mode.
To utilize clustering, you can specify the columns you want to cluster when
creating or writing to a table. Here's a simple example of how to enable
clustering:
@@ -177,8 +177,8 @@ PARTITION (k0 = 0, k1 = 0) SELECT v FROM my_table WHERE
false;
:::info
Important table properties setting:
-1. Only [primary key table](../primary-key-table/overview) supports this
feature.
-2. [MergeEngine](../primary-key-table/merge-engine/) needs to be
[deduplicate](../primary-key-table/merge-engine/overview#deduplicate)
+1. Only [primary key table](../primary-key-table/) supports this feature.
+2. [MergeEngine](../primary-key-table/merge-engine/) needs to be
[deduplicate](../primary-key-table/merge-engine/#deduplicate)
or [partial-update](../primary-key-table/merge-engine/partial-update) to
support this feature.
3. Do not support updating primary keys.
@@ -215,8 +215,8 @@ UPDATE my_table SET b = 1, c = 2 WHERE a = 'myTable';
Important table properties setting:
1. Only primary key tables support this feature.
-2. If the table has primary keys, the following
[MergeEngine](../primary-key-table/merge-engine/overview) support this feature:
- * [deduplicate](../primary-key-table/merge-engine/overview#deduplicate).
+2. If the table has primary keys, the following
[MergeEngine](../primary-key-table/merge-engine/) support this feature:
+ * [deduplicate](../primary-key-table/merge-engine/#deduplicate).
* [partial-update](../primary-key-table/merge-engine/partial-update) with
option 'partial-update.remove-record-on-delete' enabled.
3. Do not support deleting from table in streaming mode.
diff --git a/docs/docs/iceberg/index.md b/docs/docs/iceberg/index.md
index 34a3b1bb1b..6c4b307517 100644
--- a/docs/docs/iceberg/index.md
+++ b/docs/docs/iceberg/index.md
@@ -21,3 +21,91 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+# Overview
+
+Paimon supports generating Iceberg compatible metadata,
+so that Paimon tables can be consumed directly by Iceberg readers.
+
+Set the following table options, so that Paimon tables can generate Iceberg
compatible metadata.
+
+<table class="table table-bordered">
+ <thead>
+ <tr>
+ <th class="text-left" style={{width: "20%"}}>Option</th>
+ <th class="text-left" style={{width: "5%"}}>Default</th>
+ <th class="text-left" style={{width: "10%"}}>Type</th>
+ <th class="text-left" style={{width: "60%"}}>Description</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td><h5>metadata.iceberg.storage</h5></td>
+ <td style={{wordWrap: "break-word"}}>disabled</td>
+ <td>Enum</td>
+ <td>
+ When set, produce Iceberg metadata after a snapshot is committed, so
that Iceberg readers can read Paimon's raw data files.
+ <ul>
+ <li><code>disabled</code>: Disable Iceberg compatibility
support.</li>
+ <li><code>table-location</code>: Store Iceberg metadata in each
table's directory.</li>
+ <li><code>hadoop-catalog</code>: Store Iceberg metadata in a
separate directory. This directory can be specified as the warehouse directory
of an Iceberg Hadoop catalog.</li>
+ <li><code>hive-catalog</code>: Not only store Iceberg metadata like
hadoop-catalog, but also create Iceberg external table in Hive.</li>
+ </ul>
+ </td>
+ </tr>
+ <tr>
+ <td><h5>metadata.iceberg.storage-location</h5></td>
+ <td style={{wordWrap: "break-word"}}>(none)</td>
+ <td>Enum</td>
+ <td>
+ Specifies where to store Iceberg metadata files. If not set, the
storage location will default based on the selected metadata.iceberg.storage
type.
+ <ul>
+ <li><code>table-location</code>: Store Iceberg metadata in each
table's directory. Useful for standalone Iceberg tables or Iceberg Java API
access. Can also be used with Hive Catalog.</li>
+ <li><code>catalog-location</code>: Store Iceberg metadata in a
separate directory. This is the default behavior when using Hive Catalog or
Hadoop Catalog.</li>
+ </ul>
+ </td>
+ </tr>
+ </tbody>
+</table>
+
+For most SQL users, we recommend setting `'metadata.iceberg.storage' =
'hadoop-catalog'`
+or `'metadata.iceberg.storage' = 'hive-catalog'`,
+so that all tables can be visited as an Iceberg warehouse.
+For Iceberg Java API users, you might consider setting
`'metadata.iceberg.storage' = 'table-location'`,
+so you can visit each table with its table path.
+When using `metadata.iceberg.storage = hadoop-catalog` or `hive-catalog`,
+you can optionally configure `metadata.iceberg.storage-location` to control
where the metadata is stored.
+If not set, the default behavior depends on the storage type.
+
+## Supported Types
+
+Paimon Iceberg compatibility currently supports the following data types.
+
+| Paimon Data Type | Iceberg Data Type |
+|----------------|-------------------|
+| `BOOLEAN` | `boolean` |
+| `INT` | `int` |
+| `BIGINT` | `long` |
+| `FLOAT` | `float` |
+| `DOUBLE` | `double` |
+| `DECIMAL` | `decimal` |
+| `CHAR` | `string` |
+| `VARCHAR` | `string` |
+| `BINARY` | `binary` |
+| `VARBINARY` | `binary` |
+| `DATE` | `date` |
+| `TIMESTAMP` (precision 3-6) | `timestamp` |
+| `TIMESTAMP_LTZ` (precision 3-6) | `timestamptz` |
+| `TIMESTAMP` (precision 7-9) | `timestamp_ns` |
+| `TIMESTAMP_LTZ` (precision 7-9) | `timestamptz_ns` |
+| `ARRAY` | `list` |
+| `MAP` | `map` |
+| `ROW` | `struct` |
+
+:::info
+
+**Note on Timestamp Types:**
+- `TIMESTAMP` and `TIMESTAMP_LTZ` types with precision from 3 to 6 are mapped
to standard Iceberg timestamp types
+- `TIMESTAMP` and `TIMESTAMP_LTZ` types with precision from 7 to 9 use
nanosecond precision and require Iceberg v3 format
+
+:::
diff --git a/docs/docs/iceberg/overview.md b/docs/docs/iceberg/overview.md
deleted file mode 100644
index 2519d32a0a..0000000000
--- a/docs/docs/iceberg/overview.md
+++ /dev/null
@@ -1,111 +0,0 @@
----
-title: "Overview"
-sidebar_position: 1
----
-
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-# Overview
-
-Paimon supports generating Iceberg compatible metadata,
-so that Paimon tables can be consumed directly by Iceberg readers.
-
-Set the following table options, so that Paimon tables can generate Iceberg
compatible metadata.
-
-<table class="table table-bordered">
- <thead>
- <tr>
- <th class="text-left" style="width: 20%">Option</th>
- <th class="text-left" style="width: 5%">Default</th>
- <th class="text-left" style="width: 10%">Type</th>
- <th class="text-left" style="width: 60%">Description</th>
- </tr>
- </thead>
- <tbody>
- <tr>
- <td><h5>metadata.iceberg.storage</h5></td>
- <td style="word-wrap: break-word;">disabled</td>
- <td>Enum</td>
- <td>
- When set, produce Iceberg metadata after a snapshot is committed, so
that Iceberg readers can read Paimon's raw data files.
- <ul>
- <li><code>disabled</code>: Disable Iceberg compatibility
support.</li>
- <li><code>table-location</code>: Store Iceberg metadata in each
table's directory.</li>
- <li><code>hadoop-catalog</code>: Store Iceberg metadata in a
separate directory. This directory can be specified as the warehouse directory
of an Iceberg Hadoop catalog.</li>
- <li><code>hive-catalog</code>: Not only store Iceberg metadata like
hadoop-catalog, but also create Iceberg external table in Hive.</li>
- </ul>
- </td>
- </tr>
- <tr>
- <td><h5>metadata.iceberg.storage-location</h5></td>
- <td style="word-wrap: break-word;">(none)</td>
- <td>Enum</td>
- <td>
- Specifies where to store Iceberg metadata files. If not set, the
storage location will default based on the selected metadata.iceberg.storage
type.
- <ul>
- <li><code>table-location</code>: Store Iceberg metadata in each
table's directory. Useful for standalone Iceberg tables or Iceberg Java API
access. Can also be used with Hive Catalog.</li>
- <li><code>catalog-location</code>: Store Iceberg metadata in a
separate directory. This is the default behavior when using Hive Catalog or
Hadoop Catalog.</li>
- </ul>
- </td>
- </tr>
- </tbody>
-</table>
-
-For most SQL users, we recommend setting `'metadata.iceberg.storage' =
'hadoop-catalog'
-or `'metadata.iceberg.storage' = 'hive-catalog'`,
-so that all tables can be visited as an Iceberg warehouse.
-For Iceberg Java API users, you might consider setting
`'metadata.iceberg.storage' = 'table-location'`,
-so you can visit each table with its table path.
-When using `metadata.iceberg.storage = hadoop-catalog` or `hive-catalog`,
-you can optionally configure `metadata.iceberg.storage-location` to control
where the metadata is stored.
-If not set, the default behavior depends on the storage type.
-
-## Supported Types
-
-Paimon Iceberg compatibility currently supports the following data types.
-
-| Paimon Data Type | Iceberg Data Type |
-|----------------|-------------------|
-| `BOOLEAN` | `boolean` |
-| `INT` | `int` |
-| `BIGINT` | `long` |
-| `FLOAT` | `float` |
-| `DOUBLE` | `double` |
-| `DECIMAL` | `decimal` |
-| `CHAR` | `string` |
-| `VARCHAR` | `string` |
-| `BINARY` | `binary` |
-| `VARBINARY` | `binary` |
-| `DATE` | `date` |
-| `TIMESTAMP` (precision 3-6) | `timestamp` |
-| `TIMESTAMP_LTZ` (precision 3-6) | `timestamptz` |
-| `TIMESTAMP` (precision 7-9) | `timestamp_ns` |
-| `TIMESTAMP_LTZ` (precision 7-9) | `timestamptz_ns` |
-| `ARRAY` | `list` |
-| `MAP` | `map` |
-| `ROW` | `struct` |
-
-:::info
-
-**Note on Timestamp Types:**
-- `TIMESTAMP` and `TIMESTAMP_LTZ` types with precision from 3 to 6 are mapped
to standard Iceberg timestamp types
-- `TIMESTAMP` and `TIMESTAMP_LTZ` types with precision from 7 to 9 use
nanosecond precision and require Iceberg v3 format
-
-:::
diff --git a/docs/docs/index.mdx b/docs/docs/index.mdx
index 028b06b3ee..f58286a107 100644
--- a/docs/docs/index.mdx
+++ b/docs/docs/index.mdx
@@ -34,7 +34,7 @@ Data Lake Platform — unified batch, streaming, and multimodal
AI in a single l
<div className="card-grid">
-<a className="nav-card" href="concepts/overview">
+<a className="nav-card" href="concepts/">
<div className="nav-card-icon">📖</div>
<div className="nav-card-body">
<h3>Concepts</h3>
@@ -42,7 +42,7 @@ Data Lake Platform — unified batch, streaming, and multimodal
AI in a single l
</div>
</a>
-<a className="nav-card" href="primary-key-table/overview">
+<a className="nav-card" href="primary-key-table/">
<div className="nav-card-icon">🔑</div>
<div className="nav-card-body">
<h3>Table with PK</h3>
@@ -50,7 +50,7 @@ Data Lake Platform — unified batch, streaming, and multimodal
AI in a single l
</div>
</a>
-<a className="nav-card" href="append-table/overview">
+<a className="nav-card" href="append-table/">
<div className="nav-card-icon">📋</div>
<div className="nav-card-body">
<h3>Table w/o PK</h3>
@@ -74,7 +74,7 @@ Data Lake Platform — unified batch, streaming, and multimodal
AI in a single l
</div>
</a>
-<a className="nav-card" href="pypaimon/overview">
+<a className="nav-card" href="pypaimon/">
<div className="nav-card-icon">🐍</div>
<div className="nav-card-body">
<h3>PyPaimon & AI</h3>
@@ -82,7 +82,7 @@ Data Lake Platform — unified batch, streaming, and multimodal
AI in a single l
</div>
</a>
-<a className="nav-card" href="ecosystem/overview">
+<a className="nav-card" href="ecosystem/">
<div className="nav-card-icon">🔗</div>
<div className="nav-card-body">
<h3>Ecosystem</h3>
@@ -90,7 +90,7 @@ Data Lake Platform — unified batch, streaming, and multimodal
AI in a single l
</div>
</a>
-<a className="nav-card" href="cdc-ingestion/overview">
+<a className="nav-card" href="cdc-ingestion/">
<div className="nav-card-icon">🔄</div>
<div className="nav-card-body">
<h3>CDC Ingestion</h3>
diff --git a/docs/docs/learn-paimon/scenario-guide.mdx
b/docs/docs/learn-paimon/scenario-guide.mdx
index 57866da5d1..877c055f9d 100644
--- a/docs/docs/learn-paimon/scenario-guide.mdx
+++ b/docs/docs/learn-paimon/scenario-guide.mdx
@@ -53,7 +53,7 @@ configurations that are suited for different scenarios.
## Primary Key Table
Use a Primary Key Table when your data has a natural unique key and you need
**real-time updates** (insert, update, delete).
-See [Primary Key Table Overview](../primary-key-table/overview).
+See [Primary Key Table Overview](../primary-key-table/).
### Scenario 1: CDC Real-Time Sync
@@ -93,7 +93,7 @@ CREATE TABLE orders (
on data volume. If you are sensitive to data visibility latency, set a fixed
bucket number (e.g. `'bucket' = '5'`)
— roughly 1 bucket per 1GB of data in a partition.
-**CDC Ingestion Tip:** Use [Paimon CDC Ingestion](../cdc-ingestion/overview)
for whole-database sync with
+**CDC Ingestion Tip:** Use [Paimon CDC Ingestion](../cdc-ingestion/) for
whole-database sync with
automatic table creation and schema evolution support.
### Scenario 2: Multi-Stream Partial Column Updates
@@ -204,7 +204,7 @@ record for each primary key and produces insert-only
changelog, making it perfec
Use an Append Table when your data **has no natural primary key**, or you are
working with **batch ETL** pipelines
where data is only inserted and does not need upsert semantics.
-See [Append Table Overview](../append-table/overview).
+See [Append Table Overview](../append-table/).
Compared to Primary Key Tables, Append Tables have much better batch
read/write performance, simpler design, and lower
resource consumption. **We recommend using Append Tables for most batch
processing scenarios.**
@@ -571,7 +571,7 @@ df = table_read.to_pandas(splits)
arrow_table = table_read.to_arrow(splits)
```
-**Why:** [PyPaimon](../pypaimon/overview) is a pure Python SDK (no JDK
required) that integrates seamlessly
+**Why:** [PyPaimon](../pypaimon/) is a pure Python SDK (no JDK required) that
integrates seamlessly
with the Python AI ecosystem:
- **PyTorch**: Direct `DataLoader` integration with streaming and prefetch
support.
diff --git a/docs/docs/learn-paimon/understand-files.mdx
b/docs/docs/learn-paimon/understand-files.mdx
index e63b2e647f..10c993a3e4 100644
--- a/docs/docs/learn-paimon/understand-files.mdx
+++ b/docs/docs/learn-paimon/understand-files.mdx
@@ -42,7 +42,7 @@ Before delving further into this page, please ensure that you
have read through
following sections:
1. [Basic Concepts](../concepts/basic-concepts),
-2. [Primary Key Table](../primary-key-table/overview) and [Append
Table](../append-table/overview)
+2. [Primary Key Table](../primary-key-table/) and [Append
Table](../append-table/)
3. How to use Paimon in [Flink](../flink).
## Understand File Operations
diff --git a/docs/docs/maintenance/dedicated-compaction.mdx
b/docs/docs/maintenance/dedicated-compaction.mdx
index ea781e70ac..70a1d68740 100644
--- a/docs/docs/maintenance/dedicated-compaction.mdx
+++ b/docs/docs/maintenance/dedicated-compaction.mdx
@@ -282,7 +282,7 @@ For more usage of the compact_database action, see
## Sort Compact
If your table is configured with [dynamic bucket primary key
table](../primary-key-table/data-distribution#dynamic-bucket)
-or [append table](../append-table/overview) ,
+or [append table](../append-table/) ,
you can trigger a compact with specified column sort to speed up queries.
<Tabs groupId="sort-compaction-job">
diff --git a/docs/docs/migration/migration-from-hive.mdx
b/docs/docs/migration/migration-from-hive.mdx
index 2cce328521..6e84855937 100644
--- a/docs/docs/migration/migration-from-hive.mdx
+++ b/docs/docs/migration/migration-from-hive.mdx
@@ -29,7 +29,7 @@ under the License.
Apache Hive supports ORC, Parquet file formats that could be migrated to
Paimon.
When migrating data to a paimon table, the origin table will be permanently
disappeared. So please back up your data if you
-still need the original table. The migrated table will be [append
table](../append-table/overview).
+still need the original table. The migrated table will be [append
table](../append-table/).
Now, we can use paimon hive catalog with Migrate Table Procedure to totally
migrate a table from hive to paimon.
At the same time, you can use paimon hive catalog with Migrate Database
Procedure to fully synchronize all tables in the database to paimon.
diff --git a/docs/docs/primary-key-table/compaction.md
b/docs/docs/primary-key-table/compaction.md
index 5a57b9a49d..8cfb46b77b 100644
--- a/docs/docs/primary-key-table/compaction.md
+++ b/docs/docs/primary-key-table/compaction.md
@@ -167,7 +167,7 @@ Its value depends on your memory size.
### Number of Sorted Runs to Trigger Compaction
-Paimon uses [LSM tree](./overview#lsm-trees) which supports a large number of
updates. LSM organizes files in several [sorted runs](./overview#sorted-runs).
When querying records from an LSM tree, all sorted runs must be combined to
produce a complete view of all records.
+Paimon uses [LSM tree](./#lsm-trees) which supports a large number of updates.
LSM organizes files in several [sorted runs](./#sorted-runs). When querying
records from an LSM tree, all sorted runs must be combined to produce a
complete view of all records.
One can easily see that too many sorted runs will result in poor query
performance. To keep the number of sorted runs in a reasonable range, Paimon
writers will automatically perform [compactions](./compaction). The following
table property determines the minimum number of sorted runs to trigger a
compaction.
diff --git a/docs/docs/primary-key-table/data-distribution.md
b/docs/docs/primary-key-table/data-distribution.md
index 8b74724ff0..95e8c01aa6 100644
--- a/docs/docs/primary-key-table/data-distribution.md
+++ b/docs/docs/primary-key-table/data-distribution.md
@@ -24,7 +24,7 @@ under the License.
# Data Distribution
-A bucket is the smallest storage unit for reads and writes, each bucket
directory contains an [LSM tree](./overview#lsm-trees).
+A bucket is the smallest storage unit for reads and writes, each bucket
directory contains an [LSM tree](./#lsm-trees).
## Fixed Bucket
diff --git a/docs/docs/primary-key-table/index.md
b/docs/docs/primary-key-table/index.md
index 0c4e3cc1cd..a912bc6ac7 100644
--- a/docs/docs/primary-key-table/index.md
+++ b/docs/docs/primary-key-table/index.md
@@ -21,3 +21,39 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+# Overview
+
+If you define a table with primary key, you can insert, update or delete
records in the table.
+
+Primary keys consist of a set of columns that contain unique values for each
record. Paimon enforces data ordering by
+sorting the primary key within each bucket, allowing users to achieve high
performance by applying filtering conditions
+on the primary key. See [CREATE TABLE](../flink/sql-ddl#create-table).
+
+## Bucket
+
+Unpartitioned tables, or partitions in partitioned tables, are sub-divided
into buckets, to provide extra structure to the data that may be used for more
efficient querying.
+
+Each bucket directory contains an LSM tree and its [changelog
files](./changelog-producer).
+
+The range for a bucket is determined by the hash value of one or more columns
in the records. Users can specify bucketing columns by providing the
[`bucket-key` option](../maintenance/configurations#coreoptions). If no
`bucket-key` option is specified, the primary key (if defined) or the complete
record will be used as the bucket key.
+
+A bucket is the smallest storage unit for reads and writes, so the number of
buckets limits the maximum processing parallelism. This number should not be
too big, though, as it will result in lots of small files and low read
performance. In general, the recommended data size in each bucket is about
200MB - 1GB.
+
+Also, see [rescale bucket](../maintenance/rescale-bucket) if you want to
adjust the number of buckets after a table is created.
+
+## LSM Trees
+
+Paimon adopts the LSM tree (log-structured merge-tree) as the data structure
for file storage. This documentation briefly introduces the concepts about LSM
trees.
+
+### Sorted Runs
+
+LSM tree organizes files into several sorted runs. A sorted run consists of
one or multiple data files and each data file belongs to exactly one sorted run.
+
+Records within a data file are sorted by their primary keys. Within a sorted
run, ranges of primary keys of data files never overlap.
+
+
+
+As you can see, different sorted runs may have overlapped primary key ranges,
and may even contain the same primary key. When querying the LSM tree, all
sorted runs must be combined and all records with the same primary key must be
merged according to the user-specified [merge engine](./merge-engine/) and the
timestamp of each record.
+
+New records written into the LSM tree will be first buffered in memory. When
the memory buffer is full, all records in memory will be sorted and flushed to
disk. A new sorted run is now created.
diff --git a/docs/docs/primary-key-table/merge-engine/index.md
b/docs/docs/primary-key-table/merge-engine/index.md
index 4b28281e8a..a7f486b21d 100644
--- a/docs/docs/primary-key-table/merge-engine/index.md
+++ b/docs/docs/primary-key-table/merge-engine/index.md
@@ -21,3 +21,24 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+# Overview
+
+When Paimon sink receives two or more records with the same primary keys, it
will merge them into one record to keep
+primary keys unique. By specifying the `merge-engine` table property, users
can choose how records are merged together.
+
+:::info
+
+Always set `table.exec.sink.upsert-materialize` to `NONE` in Flink SQL
TableConfig, sink upsert-materialize may
+result in strange behavior. When the input is out of order, we recommend that
you use
+[Sequence Field](../sequence-rowkind#sequence-field) to correct disorder.
+
+:::
+
+## Deduplicate
+
+The `deduplicate` merge engine is the default merge engine. Paimon will only
keep the latest record and throw away
+other records with the same primary keys.
+
+Specifically, if the latest record is a `DELETE` record, all records with the
same primary keys will be deleted.
+You can config `ignore-delete` to ignore it.
diff --git a/docs/docs/primary-key-table/merge-engine/overview.md
b/docs/docs/primary-key-table/merge-engine/overview.md
deleted file mode 100644
index d767c51712..0000000000
--- a/docs/docs/primary-key-table/merge-engine/overview.md
+++ /dev/null
@@ -1,44 +0,0 @@
----
-title: "Overview"
-sidebar_position: 1
----
-
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-# Overview
-
-When Paimon sink receives two or more records with the same primary keys, it
will merge them into one record to keep
-primary keys unique. By specifying the `merge-engine` table property, users
can choose how records are merged together.
-
-:::info
-
-Always set `table.exec.sink.upsert-materialize` to `NONE` in Flink SQL
TableConfig, sink upsert-materialize may
-result in strange behavior. When the input is out of order, we recommend that
you use
-[Sequence Field](../sequence-rowkind#sequence-field) to correct disorder.
-
-:::
-
-## Deduplicate
-
-The `deduplicate` merge engine is the default merge engine. Paimon will only
keep the latest record and throw away
-other records with the same primary keys.
-
-Specifically, if the latest record is a `DELETE` record, all records with the
same primary keys will be deleted.
-You can config `ignore-delete` to ignore it.
diff --git a/docs/docs/primary-key-table/overview.md
b/docs/docs/primary-key-table/overview.md
deleted file mode 100644
index 308861b46c..0000000000
--- a/docs/docs/primary-key-table/overview.md
+++ /dev/null
@@ -1,59 +0,0 @@
----
-title: "Overview"
-sidebar_position: 1
----
-
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-# Overview
-
-If you define a table with primary key, you can insert, update or delete
records in the table.
-
-Primary keys consist of a set of columns that contain unique values for each
record. Paimon enforces data ordering by
-sorting the primary key within each bucket, allowing users to achieve high
performance by applying filtering conditions
-on the primary key. See [CREATE TABLE](../flink/sql-ddl#create-table).
-
-## Bucket
-
-Unpartitioned tables, or partitions in partitioned tables, are sub-divided
into buckets, to provide extra structure to the data that may be used for more
efficient querying.
-
-Each bucket directory contains an LSM tree and its [changelog
files](./changelog-producer).
-
-The range for a bucket is determined by the hash value of one or more columns
in the records. Users can specify bucketing columns by providing the
[`bucket-key` option](../maintenance/configurations#coreoptions). If no
`bucket-key` option is specified, the primary key (if defined) or the complete
record will be used as the bucket key.
-
-A bucket is the smallest storage unit for reads and writes, so the number of
buckets limits the maximum processing parallelism. This number should not be
too big, though, as it will result in lots of small files and low read
performance. In general, the recommended data size in each bucket is about
200MB - 1GB.
-
-Also, see [rescale bucket](../maintenance/rescale-bucket) if you want to
adjust the number of buckets after a table is created.
-
-## LSM Trees
-
-Paimon adopts the LSM tree (log-structured merge-tree) as the data structure
for file storage. This documentation briefly introduces the concepts about LSM
trees.
-
-### Sorted Runs
-
-LSM tree organizes files into several sorted runs. A sorted run consists of
one or multiple data files and each data file belongs to exactly one sorted run.
-
-Records within a data file are sorted by their primary keys. Within a sorted
run, ranges of primary keys of data files never overlap.
-
-
-
-As you can see, different sorted runs may have overlapped primary key ranges,
and may even contain the same primary key. When querying the LSM tree, all
sorted runs must be combined and all records with the same primary key must be
merged according to the user-specified [merge engine](./merge-engine/overview)
and the timestamp of each record.
-
-New records written into the LSM tree will be first buffered in memory. When
the memory buffer is full, all records in memory will be sorted and flushed to
disk. A new sorted run is now created.
diff --git a/docs/docs/program-api/rest-api.mdx
b/docs/docs/program-api/rest-api.mdx
index 711f7d8f00..450636b215 100644
--- a/docs/docs/program-api/rest-api.mdx
+++ b/docs/docs/program-api/rest-api.mdx
@@ -27,7 +27,7 @@ under the License.
# REST API
-This is Java API for [REST](../concepts/rest/overview).
+This is Java API for [REST](../concepts/rest/).
## Dependency
diff --git a/docs/docs/pypaimon/index.md b/docs/docs/pypaimon/index.md
index c4185ca2ea..18989aaf82 100644
--- a/docs/docs/pypaimon/index.md
+++ b/docs/docs/pypaimon/index.md
@@ -21,3 +21,32 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
+
+# Overview
+
+PyPaimon is a Python implementation for connecting Paimon catalog, reading &
writing tables. The complete Python
+implementation of the brand new PyPaimon does not require JDK installation.
+
+## Environment Settings
+
+SDK is published at [pypaimon](https://pypi.org/project/pypaimon/). You can
install by
+
+```shell
+pip install pypaimon
+```
+
+## Build From Source
+
+You can build the source package by executing the following command:
+
+```commandline
+python3 setup.py sdist
+```
+
+The package is under `dist/`. Then you can install the package by executing
the following command:
+
+```commandline
+pip3 install dist/*.tar.gz
+```
+
+The command will install the package and core dependencies to your local
Python environment.
diff --git a/docs/docs/pypaimon/overview.md b/docs/docs/pypaimon/overview.md
deleted file mode 100644
index e9e759eec2..0000000000
--- a/docs/docs/pypaimon/overview.md
+++ /dev/null
@@ -1,52 +0,0 @@
----
-title: "Overview"
-sidebar_position: 1
----
-
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-# Overview
-
-PyPaimon is a Python implementation for connecting Paimon catalog, reading &
writing tables. The complete Python
-implementation of the brand new PyPaimon does not require JDK installation.
-
-## Environment Settings
-
-SDK is published at [pypaimon](https://pypi.org/project/pypaimon/). You can
install by
-
-```shell
-pip install pypaimon
-```
-
-## Build From Source
-
-You can build the source package by executing the following command:
-
-```commandline
-python3 setup.py sdist
-```
-
-The package is under `dist/`. Then you can install the package by executing
the following command:
-
-```commandline
-pip3 install dist/*.tar.gz
-```
-
-The command will install the package and core dependencies to your local
Python environment.
diff --git a/docs/docs/pypaimon/python-api.mdx
b/docs/docs/pypaimon/python-api.mdx
index d0c976faa1..52afbc2676 100644
--- a/docs/docs/pypaimon/python-api.mdx
+++ b/docs/docs/pypaimon/python-api.mdx
@@ -105,7 +105,7 @@ catalog_options = {
<TabItem value="rest-catalog" label="rest catalog">
-The sample code is as follows. The detailed meaning of option can be found in
[REST](../concepts/rest/overview).
+The sample code is as follows. The detailed meaning of option can be found in
[REST](../concepts/rest/).
```python
from pypaimon import CatalogFactory
diff --git a/docs/sidebars.js b/docs/sidebars.js
index 634d0830e9..d6f35bec72 100644
--- a/docs/sidebars.js
+++ b/docs/sidebars.js
@@ -11,7 +11,6 @@ const sidebars = {
"id": "concepts/index"
},
"items": [
- "concepts/overview",
"concepts/basic-concepts",
"concepts/concurrency-control",
"concepts/catalog",
@@ -24,10 +23,10 @@ const sidebars = {
"label": "RESTCatalog",
"collapsed": true,
"link": {
- type: "generated-index"
+ type: "doc",
+ "id": "concepts/rest/index"
},
"items": [
- "concepts/rest/overview",
"concepts/rest/bear",
"concepts/rest/dlf",
"concepts/rest/tables",
@@ -40,10 +39,10 @@ const sidebars = {
"label": "Specification",
"collapsed": true,
"link": {
- type: "generated-index"
+ type: "doc",
+ "id": "concepts/spec/index"
},
"items": [
- "concepts/spec/overview",
"concepts/spec/schema",
"concepts/spec/snapshot",
"concepts/spec/manifest",
@@ -65,7 +64,6 @@ const sidebars = {
"id": "primary-key-table/index"
},
"items": [
- "primary-key-table/overview",
"primary-key-table/data-distribution",
"primary-key-table/table-mode",
"primary-key-table/changelog-producer",
@@ -79,10 +77,10 @@ const sidebars = {
"label": "Merge Engine",
"collapsed": true,
"link": {
- type: "generated-index"
+ type: "doc",
+ "id": "primary-key-table/merge-engine/index"
},
"items": [
- "primary-key-table/merge-engine/overview",
"primary-key-table/merge-engine/partial-update",
"primary-key-table/merge-engine/aggregation",
"primary-key-table/merge-engine/first-row"
@@ -99,7 +97,6 @@ const sidebars = {
"id": "append-table/index"
},
"items": [
- "append-table/overview",
"append-table/incremental-clustering",
"append-table/bucketed",
"append-table/row-tracking",
@@ -163,7 +160,6 @@ const sidebars = {
"id": "pypaimon/index"
},
"items": [
- "pypaimon/overview",
"pypaimon/python-api",
"pypaimon/manage-tags",
"pypaimon/ray-data",
@@ -187,7 +183,6 @@ const sidebars = {
"id": "ecosystem/index"
},
"items": [
- "ecosystem/overview",
"ecosystem/starrocks",
"ecosystem/doris",
"ecosystem/hive",
@@ -204,7 +199,6 @@ const sidebars = {
"id": "cdc-ingestion/index"
},
"items": [
- "cdc-ingestion/overview",
"cdc-ingestion/mysql-cdc",
"cdc-ingestion/postgres-cdc",
"cdc-ingestion/kafka-cdc",
@@ -276,7 +270,6 @@ const sidebars = {
"id": "iceberg/index"
},
"items": [
- "iceberg/overview",
"iceberg/append-table",
"iceberg/primary-key-table",
"iceberg/iceberg-tags",