This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 6597ffb5fc9 [paimon] add paimon quick start (#924)
6597ffb5fc9 is described below
commit 6597ffb5fc9b115462af764c32e09a6045978921
Author: Mingyu Chen <[email protected]>
AuthorDate: Tue Jul 30 12:10:10 2024 +0800
[paimon] add paimon quick start (#924)
---
docs/get-starting/quick-start/doris-paimon.md | 270 +++++++++++++++++++++
.../get-starting/quick-start/doris-paimon.md | 270 +++++++++++++++++++++
.../get-starting/quick-start/doris-paimon.md | 269 ++++++++++++++++++++
.../get-starting/quick-start/doris-paimon.md | 269 ++++++++++++++++++++
sidebars.json | 3 +-
.../images/quick-start/lakehouse-paimon-arch.jpeg | Bin 0 -> 385441 bytes
.../quick-start/lakehouse-paimon-benchmark.PNG | Bin 0 -> 60007 bytes
.../get-starting/quick-start/doris-paimon.md | 270 +++++++++++++++++++++
.../get-starting/quick-start/doris-paimon.md | 270 +++++++++++++++++++++
versioned_sidebars/version-2.1-sidebars.json | 5 +-
versioned_sidebars/version-3.0-sidebars.json | 3 +-
11 files changed, 1625 insertions(+), 4 deletions(-)
diff --git a/docs/get-starting/quick-start/doris-paimon.md
b/docs/get-starting/quick-start/doris-paimon.md
new file mode 100644
index 00000000000..cf1956c6e2d
--- /dev/null
+++ b/docs/get-starting/quick-start/doris-paimon.md
@@ -0,0 +1,270 @@
+---
+{
+ "title": "Apache Doris & Paimon Quick Start",
+ "language": "en"
+}
+
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+As a new open data management architecture, the Data Lakehouse integrates the
high performance and real-time capabilities of data warehouses with the low
cost and flexibility of data lakes, helping users more conveniently meet
various data processing and analysis needs. It has been increasingly applied in
enterprise big data systems.
+
+In recent versions, Apache Doris has deepened its integration with data lakes
and has evolved a mature Data Lakehouse solution.
+
+- Since version 0.15, Apache Doris has introduced Hive and Iceberg external
tables, exploring the capabilities of combining with Apache Iceberg for data
lakes.
+- Starting from version 1.2, Apache Doris officially introduced the
Multi-Catalog feature, achieving automatic metadata mapping and data access for
various data sources, along with many performance optimizations for external
data reading and query execution. It now fully possesses the ability to build a
high-speed and user-friendly Lakehouse architecture.
+- In version 2.1, Apache Doris' Data Lakehouse architecture was significantly
enhanced, strengthening the reading and writing capabilities of mainstream data
lake formats (Hudi, Iceberg, Paimon, etc.), introducing compatibility with
multiple SQL dialects, and seamless migration from existing systems to Apache
Doris. For data science and large-scale data reading scenarios, Doris
integrated the Arrow Flight high-speed reading interface, achieving a 100-fold
improvement in data transfer eff [...]
+
+
+
+## Apache Doris & Paimon
+
+Apache Paimon is a data lake format that innovatively combines the advantages
of data lake formats and LSM structures, successfully introducing efficient
real-time streaming update capabilities into data lake architecture. This
enables Paimon to efficiently manage data and perform real-time analysis,
providing strong support for building real-time Data Lakehouse architecture.
+
+To fully leverage Paimon's capabilities and improve query efficiency for
Paimon data, Apache Doris provides native support for several of Paimon's
latest features:
+
+- Supports various types of Paimon Catalogs such as Hive Metastore and
FileSystem.
+- Native support for Paimon 0.6's Primary Key Table Read Optimized feature.
+- Native support for Paimon 0.8's Primary Key Table Deletion Vector feature.
+
+With Apache Doris' high-performance query engine and Apache Paimon's efficient
real-time streaming update capabilities, users can achieve:
+
+- Real-time data ingestion into the lake: Leveraging Paimon's LSM-Tree model,
data ingestion into the lake can be reduced to a minute-level timeliness.
Additionally, Paimon supports various data update capabilities including
aggregation, deduplication, and partial column updates, making data flow more
flexible and efficient.
+- High-performance data processing and analysis: Paimon's technologies such as
Append Only Table, Read Optimized, and Deletion Vector can be seamlessly
integrated with Doris' powerful query engine, enabling fast querying and
analysis responses for lake data.
+
+In the future, Apache Doris will gradually support more advanced features of
Apache Paimon, including Time Travel and incremental data reading, to jointly
build a unified, high-performance, real-time lakehouse platform.
+
+This article will explain how to quickly set up an Apache Doris + Apache
Paimon testing & demonstration environment in a Docker environment and
demonstrate the usage of various features.
+
+For more information, please refer to [Paimon
Catalog](../../lakehouse/datalake-analytics/paimon.md)
+
+## User Guide
+
+All scripts and code mentioned in this article can be obtained from the
following address:
[https://github.com/apache/doris/tree/master/samples/datalake/iceberg_and_paimon](https://github.com/apache/doris/tree/master/samples/datalake/iceberg_and_paimon)
+
+### 01 Environment Preparation
+
+This article uses Docker Compose for deployment, with the following components
and versions:
+
+| Component | Version |
+| --- | --- |
+| Apache Doris | Default 2.1.5, can be modified |
+| Apache Paimon | 0.8 |
+| Apache Flink | 1.18 |
+| MinIO | RELEASE.2024-04-29T09-56-05Z |
+
+### 02 Environment Deployment
+
+1. Start all components
+
+ `bash ./start_all.sh`
+
+2. After starting, you can use the following scripts to log in to the Flink
command line or Doris command line:
+
+ ```
+ -- login flink
+ bash ./start_flink_client.sh
+
+ -- login doris
+ bash ./start_doris_client.sh
+ ```
+
+### 03 Data Preparation
+
+After logging into the Flink command line, you can see a pre-built table. The
table already contains some data that can be viewed using Flink SQL.
+
+```
+Flink SQL> use paimon.db_paimon;
+[INFO] Execute statement succeed.
+
+Flink SQL> show tables;
++------------+
+| table name |
++------------+
+| customer |
++------------+
+1 row in set
+
+Flink SQL> show create table customer;
++------------------------------------------------------------------------+
+| result |
++------------------------------------------------------------------------+
+| CREATE TABLE `paimon`.`db_paimon`.`customer` (
+ `c_custkey` INT NOT NULL,
+ `c_name` VARCHAR(25),
+ `c_address` VARCHAR(40),
+ `c_nationkey` INT NOT NULL,
+ `c_phone` CHAR(15),
+ `c_acctbal` DECIMAL(12, 2),
+ `c_mktsegment` CHAR(10),
+ `c_comment` VARCHAR(117),
+ CONSTRAINT `PK_c_custkey_c_nationkey` PRIMARY KEY (`c_custkey`,
`c_nationkey`) NOT ENFORCED
+) PARTITIONED BY (`c_nationkey`)
+WITH (
+ 'bucket' = '1',
+ 'path' = 's3://warehouse/wh/db_paimon.db/customer',
+ 'deletion-vectors.enabled' = 'true'
+)
+ |
++-------------------------------------------------------------------------+
+1 row in set
+
+Flink SQL> desc customer;
++--------------+----------------+-------+-----------------------------+--------+-----------+
+| name | type | null | key | extras
| watermark |
++--------------+----------------+-------+-----------------------------+--------+-----------+
+| c_custkey | INT | FALSE | PRI(c_custkey, c_nationkey) |
| |
+| c_name | VARCHAR(25) | TRUE | |
| |
+| c_address | VARCHAR(40) | TRUE | |
| |
+| c_nationkey | INT | FALSE | PRI(c_custkey, c_nationkey) |
| |
+| c_phone | CHAR(15) | TRUE | |
| |
+| c_acctbal | DECIMAL(12, 2) | TRUE | |
| |
+| c_mktsegment | CHAR(10) | TRUE | |
| |
+| c_comment | VARCHAR(117) | TRUE | |
| |
++--------------+----------------+-------+-----------------------------+--------+-----------+
+8 rows in set
+
+Flink SQL> select * from customer order by c_custkey limit 4;
++-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+--------------------------------+
+| c_custkey | c_name | c_address |
c_nationkey | c_phone | c_acctbal | c_mktsegment |
c_comment |
++-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+--------------------------------+
+| 1 | Customer#000000001 | IVhzIApeRb ot,c,E |
15 | 25-989-741-2988 | 711.56 | BUILDING | to the even, regular
platel... |
+| 2 | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak |
13 | 23-768-687-3665 | 121.65 | AUTOMOBILE | l accounts. blithely
ironic... |
+| 3 | Customer#000000003 | MG9kdTD2WBHm |
1 | 11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly ironic,...
|
+| 32 | Customer#000000032 | jD2xZzi UmId,DCtNBLXKj9q0Tl... |
15 | 25-430-914-2194 | 3471.53 | BUILDING | cial ideas. final, furious
... |
++-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+--------------------------------+
+4 rows in set
+```
+
+### 04 Data Query
+
+As shown below, a Catalog named `paimon` has been created in the Doris cluster
(can be viewed using SHOW CATALOGS). The following is the statement for
creating this Catalog:
+
+```
+-- 已创建,无需执行
+CREATE CATALOG `paimon` PROPERTIES (
+ "type" = "paimon",
+ "warehouse" = "s3://warehouse/wh/",
+ "s3.endpoint"="http://minio:9000",
+ "s3.access_key"="admin",
+ "s3.secret_key"="password",
+ "s3.region"="us-east-1"
+);
+```
+
+You can query Paimon's data in Doris:
+
+```
+mysql> use paimon.db_paimon;
+Reading table information for completion of table and column names
+You can turn off this feature to get a quicker startup with -A
+
+Database changed
+mysql> show tables;
++---------------------+
+| Tables_in_db_paimon |
++---------------------+
+| customer |
++---------------------+
+1 row in set (0.00 sec)
+
+mysql> select * from customer order by c_custkey limit 4;
++-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| c_custkey | c_name | c_address |
c_nationkey | c_phone | c_acctbal | c_mktsegment | c_comment
|
++-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| 1 | Customer#000000001 | IVhzIApeRb ot,c,E |
15 | 25-989-741-2988 | 711.56 | BUILDING | to the even, regular
platelets. regular, ironic epitaphs nag e
|
+| 2 | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak |
13 | 23-768-687-3665 | 121.65 | AUTOMOBILE | l accounts. blithely
ironic theodolites integrate boldly: caref
|
+| 3 | Customer#000000003 | MG9kdTD2WBHm |
1 | 11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly
ironic, even instructions. express foxes detect slyly. blithely even accounts
abov |
+| 32 | Customer#000000032 | jD2xZzi UmId,DCtNBLXKj9q0Tlp2iQ6ZcO3J |
15 | 25-430-914-2194 | 3471.53 | BUILDING | cial ideas. final,
furious requests across the e
|
++-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+4 rows in set (1.89 sec)
+```
+
+### 05 Read Incremental Data
+
+You can update the data in the Paimon table using Flink SQL:
+
+```
+Flink SQL> update customer set c_address='c_address_update' where c_nationkey
= 1;
+[INFO] Submitting SQL update statement to the cluster...
+[INFO] SQL update statement has been successfully submitted to the cluster:
+Job ID: ff838b7b778a94396b332b0d93c8f7ac
+```
+
+After the Flink SQL execution is complete, you can directly view the latest
data in Doris:
+
+```
+mysql> select * from customer where c_nationkey=1 limit 2;
++-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| c_custkey | c_name | c_address | c_nationkey | c_phone
| c_acctbal | c_mktsegment | c_comment
|
++-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| 3 | Customer#000000003 | c_address_update | 1 |
11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly ironic, even
instructions. express foxes detect slyly. blithely even accounts abov |
+| 513 | Customer#000000513 | c_address_update | 1 |
11-861-303-6887 | 955.37 | HOUSEHOLD | press along the quickly regular
instructions. regular requests against the carefully ironic s |
++-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+2 rows in set (0.19 sec)
+```
+
+### Benchmark
+
+We conducted a simple test on the TPCDS 1000 dataset in Paimon (0.8) version,
using Apache Doris 2.1.5 version and Trino 422 version, both with the Primary
Key Table Read Optimized feature enabled.
+
+
+
+From the test results, it can be seen that Doris's average query performance
on the standard static test set is 3-5 times that of Trino. In the future, we
will optimize the Deletion Vector to further improve query efficiency in real
business scenarios.
+
+## Query Optimization
+
+For baseline data, after introducing the Primary Key Table Read Optimized
feature in Apache Paimon version 0.6, the query engine can directly access the
underlying Parquet/ORC files, significantly improving the reading efficiency of
baseline data. For unmerged incremental data (data increments generated by
INSERT, UPDATE, or DELETE), they can be read through Merge-on-Read. In
addition, Paimon introduced the Deletion Vector feature in version 0.8, which
further enhances the query engine's [...]
+Apache Doris supports reading Deletion Vector through native Reader and
performing Merge on Read. We demonstrate the query methods for baseline data
and incremental data in a query using Doris's EXPLAIN statement.
+
+```
+mysql> explain verbose select * from customer where c_nationkey < 3;
++------------------------------------------------------------------------------------------------------------------------------------------------+
+| Explain String(Nereids Planner)
|
++------------------------------------------------------------------------------------------------------------------------------------------------+
+| ...............
|
+|
|
+| 0:VPAIMON_SCAN_NODE(68)
|
+| table: customer
|
+| predicates: (c_nationkey[#3] < 3)
|
+| inputSplitNum=4, totalFileSize=238324, scanRanges=4
|
+| partition=3/0
|
+| backends:
|
+| 10002
|
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=1/bucket-0/data-15cee5b7-1bd7-42ca-9314-56d92c62c03b-0.orc
start: 0 length: 66600 |
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=1/bucket-0/data-5d50255a-2215-4010-b976-d5dc656f3444-0.orc
start: 0 length: 44501 |
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=2/bucket-0/data-e98fb7ef-ec2b-4ad5-a496-713cb9481d56-0.orc
start: 0 length: 64059 |
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=0/bucket-0/data-431be05d-50fa-401f-9680-d646757d0f95-0.orc
start: 0 length: 63164 |
+| cardinality=18751, numNodes=1
|
+| pushdown agg=NONE
|
+| paimonNativeReadSplits=4/4
|
+| PaimonSplitStats:
|
+| SplitStat [type=NATIVE, rowCount=1542, rawFileConvertable=true,
hasDeletionVector=true] |
+| SplitStat [type=NATIVE, rowCount=750, rawFileConvertable=true,
hasDeletionVector=false] |
+| SplitStat [type=NATIVE, rowCount=750, rawFileConvertable=true,
hasDeletionVector=false] |
+| tuple ids: 0
+| ...............
|
|
++------------------------------------------------------------------------------------------------------------------------------------------------+
+67 rows in set (0.23 sec)
+```
+
+It can be seen that the table just updated by Flink SQL contains 4 shards, and
all shards can be accessed through Native Reader (paimonNativeReadSplits=4/4).
In addition, the hasDeletionVector property of the first shard is true,
indicating that the shard has a corresponding Deletion Vector, and data will be
filtered according to the Deletion Vector when reading.
+
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/get-starting/quick-start/doris-paimon.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/get-starting/quick-start/doris-paimon.md
new file mode 100644
index 00000000000..68d6fba0c90
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/get-starting/quick-start/doris-paimon.md
@@ -0,0 +1,270 @@
+---
+{
+ "title": "Apache Doris & Paimon 快速开始",
+ "language": "zh-CN"
+}
+
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+作为一种全新的开放式的数据管理架构,湖仓一体(Data
Lakehouse)融合了数据仓库的高性能、实时性以及数据湖的低成本、灵活性等优势,帮助用户更加便捷地满足各种数据处理分析的需求,在企业的大数据体系中已经得到越来越多的应用。
+
+在过去多个版本中,Apache Doris 持续加深与数据湖的融合,当前已演进出一套成熟的湖仓一体解决方案。
+
+- 自 0.15 版本起,Apache Doris 引入 Hive 和 Iceberg 外部表,尝试在 Apache Iceberg
之上探索与数据湖的能力结合。
+- 自 1.2 版本起,Apache Doris 正式引入 Multi-Catalog
功能,实现了多种数据源的自动元数据映射和数据访问、并对外部数据读取和查询执行等方面做了诸多性能优化,完全具备了构建极速易用 Lakehouse 架构的能力。
+- 在 2.1 版本中,Apache Doris 湖仓一体架构得到全面加强,不仅增强了主流数据湖格式(Hudi、Iceberg、Paimon
等)的读取和写入能力,还引入了多 SQL 方言兼容、可从原有系统无缝切换至 Apache Doris。在数据科学及大规模数据读取场景上, Doris 集成了
Arrow Flight 高速读取接口,使得数据传输效率实现 100 倍的提升。
+
+
+
+## Apache Doris & Paimon
+
+Apache Paimon 是一种数据湖格式,并创新性地将数据湖格式和 LSM 结构的优势相结合,成功将高效的实时流更新能力引入数据湖架构中,这使得
Paimon 能够实现数据的高效管理和实时分析,为构建实时湖仓架构提供了强大的支撑。
+
+为了充分发挥 Paimon 的能力,提高对 Paimon 数据的查询效率,Apache Doris 对 Paimon 的多项最新特性提供了原生支持:
+
+- 支持 Hive Metastore、FileSystem 等多种类型的 Paimon Catalog。
+- 原生支持 Paimon 0.6 版本发布的 Primary Key Table Read Optimized 功能。
+- 原生支持 Paimon 0.8 版本发布的 Primary Key Table Deletion Vector 功能。
+
+基于 Apache Doris 的高性能查询引擎和 Apache Paimon 高效的实时流更新能力,用户可以实现:
+
+- 数据实时入湖:借助 Paimon 的 LSM-Tree 模型,数据入湖的时效性可以降低到分钟级;同时,Paimon
支持包括聚合、去重、部分列更新在内的多种数据更新能力,使得数据流动更加灵活高效。
+- 高性能数据处理分析:Paimon 所提供的 Append Only Table、Read Optimized、Deletion Vector
等技术,可与 Doris 强大的查询引擎对接,实现湖上数据的快速查询及分析响应。
+
+未来 Apache Doris 将会逐步支持包括 Time Travel、增量数据读取在内的 Apache Paimon
更多高级特性,共同构建统一、高性能、实时的湖仓平台。
+
+本文将会再 Docker 环境中,为读者讲解如何快速搭建 Apache Doris + Apache Paimon 测试 &
演示环境,并展示各功能的使用操作。
+
+关于更多说明,请参阅 [Paimon Catalog](../../lakehouse/datalake-analytics/paimon.md)
+
+## 使用指南
+
+本文涉及所有脚本和代码可以从该地址获取:[https://github.com/apache/doris/tree/master/samples/datalake/iceberg_and_paimon](https://github.com/apache/doris/tree/master/samples/datalake/iceberg_and_paimon)
+
+### 01 环境准备
+
+本文示例采用 Docker Compose 部署,组件及版本号如下:
+
+| 组件名称 | 版本 |
+| --- | --- |
+| Apache Doris | 默认 2.1.5,可修改 |
+| Apache Paimon | 0.8|
+| Apache Flink | 1.18|
+| MinIO | RELEASE.2024-04-29T09-56-05Z|
+
+### 02 环境部署
+
+1. 启动所有组件
+
+ `bash ./start_all.sh`
+
+2. 启动后,可以使用如下脚本,登陆 Flink 命令行或 Doris 命令行:
+
+ ```
+ -- login flink
+ bash ./start_flink_client.sh
+
+ -- login doris
+ bash ./start_doris_client.sh
+ ```
+
+### 03 数据准备
+
+首先登陆 Flink 命令行后,可以看到一张预构建的表。表中已经包含一些数据,我们可以通过 Flink SQL 进行查看。
+
+```
+Flink SQL> use paimon.db_paimon;
+[INFO] Execute statement succeed.
+
+Flink SQL> show tables;
++------------+
+| table name |
++------------+
+| customer |
++------------+
+1 row in set
+
+Flink SQL> show create table customer;
++------------------------------------------------------------------------+
+| result |
++------------------------------------------------------------------------+
+| CREATE TABLE `paimon`.`db_paimon`.`customer` (
+ `c_custkey` INT NOT NULL,
+ `c_name` VARCHAR(25),
+ `c_address` VARCHAR(40),
+ `c_nationkey` INT NOT NULL,
+ `c_phone` CHAR(15),
+ `c_acctbal` DECIMAL(12, 2),
+ `c_mktsegment` CHAR(10),
+ `c_comment` VARCHAR(117),
+ CONSTRAINT `PK_c_custkey_c_nationkey` PRIMARY KEY (`c_custkey`,
`c_nationkey`) NOT ENFORCED
+) PARTITIONED BY (`c_nationkey`)
+WITH (
+ 'bucket' = '1',
+ 'path' = 's3://warehouse/wh/db_paimon.db/customer',
+ 'deletion-vectors.enabled' = 'true'
+)
+ |
++-------------------------------------------------------------------------+
+1 row in set
+
+Flink SQL> desc customer;
++--------------+----------------+-------+-----------------------------+--------+-----------+
+| name | type | null | key | extras
| watermark |
++--------------+----------------+-------+-----------------------------+--------+-----------+
+| c_custkey | INT | FALSE | PRI(c_custkey, c_nationkey) |
| |
+| c_name | VARCHAR(25) | TRUE | |
| |
+| c_address | VARCHAR(40) | TRUE | |
| |
+| c_nationkey | INT | FALSE | PRI(c_custkey, c_nationkey) |
| |
+| c_phone | CHAR(15) | TRUE | |
| |
+| c_acctbal | DECIMAL(12, 2) | TRUE | |
| |
+| c_mktsegment | CHAR(10) | TRUE | |
| |
+| c_comment | VARCHAR(117) | TRUE | |
| |
++--------------+----------------+-------+-----------------------------+--------+-----------+
+8 rows in set
+
+Flink SQL> select * from customer order by c_custkey limit 4;
++-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+--------------------------------+
+| c_custkey | c_name | c_address |
c_nationkey | c_phone | c_acctbal | c_mktsegment |
c_comment |
++-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+--------------------------------+
+| 1 | Customer#000000001 | IVhzIApeRb ot,c,E |
15 | 25-989-741-2988 | 711.56 | BUILDING | to the even, regular
platel... |
+| 2 | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak |
13 | 23-768-687-3665 | 121.65 | AUTOMOBILE | l accounts. blithely
ironic... |
+| 3 | Customer#000000003 | MG9kdTD2WBHm |
1 | 11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly ironic,...
|
+| 32 | Customer#000000032 | jD2xZzi UmId,DCtNBLXKj9q0Tl... |
15 | 25-430-914-2194 | 3471.53 | BUILDING | cial ideas. final, furious
... |
++-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+--------------------------------+
+4 rows in set
+```
+
+### 04 数据查询
+
+如下所示,Doris 集群中已经创建了名为paimon 的 Catalog(可通过 SHOW CATALOGS 查看)。以下为该 Catalog 的创建语句:
+
+```
+-- 已创建,无需执行
+CREATE CATALOG `paimon` PROPERTIES (
+ "type" = "paimon",
+ "warehouse" = "s3://warehouse/wh/",
+ "s3.endpoint"="http://minio:9000",
+ "s3.access_key"="admin",
+ "s3.secret_key"="password",
+ "s3.region"="us-east-1"
+);
+```
+
+你可登录到 Doris 中查询 Paimon 的数据:
+
+```
+mysql> use paimon.db_paimon;
+Reading table information for completion of table and column names
+You can turn off this feature to get a quicker startup with -A
+
+Database changed
+mysql> show tables;
++---------------------+
+| Tables_in_db_paimon |
++---------------------+
+| customer |
++---------------------+
+1 row in set (0.00 sec)
+
+mysql> select * from customer order by c_custkey limit 4;
++-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| c_custkey | c_name | c_address |
c_nationkey | c_phone | c_acctbal | c_mktsegment | c_comment
|
++-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| 1 | Customer#000000001 | IVhzIApeRb ot,c,E |
15 | 25-989-741-2988 | 711.56 | BUILDING | to the even, regular
platelets. regular, ironic epitaphs nag e
|
+| 2 | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak |
13 | 23-768-687-3665 | 121.65 | AUTOMOBILE | l accounts. blithely
ironic theodolites integrate boldly: caref
|
+| 3 | Customer#000000003 | MG9kdTD2WBHm |
1 | 11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly
ironic, even instructions. express foxes detect slyly. blithely even accounts
abov |
+| 32 | Customer#000000032 | jD2xZzi UmId,DCtNBLXKj9q0Tlp2iQ6ZcO3J |
15 | 25-430-914-2194 | 3471.53 | BUILDING | cial ideas. final,
furious requests across the e
|
++-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+4 rows in set (1.89 sec)
+```
+
+### 05 读取增量数据
+
+我们可以通过 Flink SQL 更新 Paimon 表中的数据:
+
+```
+Flink SQL> update customer set c_address='c_address_update' where c_nationkey
= 1;
+[INFO] Submitting SQL update statement to the cluster...
+[INFO] SQL update statement has been successfully submitted to the cluster:
+Job ID: ff838b7b778a94396b332b0d93c8f7ac
+```
+
+等 Flink SQL 执行完毕后,在 Doris 中可直接查看到最新的数据:
+
+```
+mysql> select * from customer where c_nationkey=1 limit 2;
++-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| c_custkey | c_name | c_address | c_nationkey | c_phone
| c_acctbal | c_mktsegment | c_comment
|
++-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| 3 | Customer#000000003 | c_address_update | 1 |
11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly ironic, even
instructions. express foxes detect slyly. blithely even accounts abov |
+| 513 | Customer#000000513 | c_address_update | 1 |
11-861-303-6887 | 955.37 | HOUSEHOLD | press along the quickly regular
instructions. regular requests against the carefully ironic s |
++-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+2 rows in set (0.19 sec)
+```
+
+### Benchmark
+
+我们在 Paimon(0.8)版本的 TPCDS 1000 数据集上进行了简单的测试,分别使用了 Apache Doris 2.1.5 版本和 Trino
422 版本,均开启 Primary Key Table Read Optimized 功能。
+
+
+
+从测试结果可以看到,Doris 在标准静态测试集上的平均查询性能是 Trino 的 3 -5 倍。后续我们将针对 Deletion Vector
进行优化,进一步提升真实业务场景下的查询效率。
+
+## 查询优化
+
+对于基线数据来说,Apache Paimon 在 0.6 版本中引入 Primary Key Table Read Optimized
功能后,使得查询引擎可以直接访问底层的 Parquet/ORC 文件,大幅提升了基线数据的读取效率。对于尚未合并的增量数据( INSERT、UPDATE 或
DELETE 所产生的数据增量)来说,可以通过 Merge-on-Read 的方式进行读取。此外,Paimon 在 0.8 版本中还引入的 Deletion
Vector 功能,能够进一步提升查询引擎对增量数据的读取效率。
+Apache Doris 支持通过原生的 Reader 读取 Deletion Vector 并进行 Merge on Read,我们通过 Doris 的
EXPLAIN 语句,来演示在一个查询中,基线数据和增量数据的查询方式。
+
+```
+mysql> explain verbose select * from customer where c_nationkey < 3;
++------------------------------------------------------------------------------------------------------------------------------------------------+
+| Explain String(Nereids Planner)
|
++------------------------------------------------------------------------------------------------------------------------------------------------+
+| ...............
|
+|
|
+| 0:VPAIMON_SCAN_NODE(68)
|
+| table: customer
|
+| predicates: (c_nationkey[#3] < 3)
|
+| inputSplitNum=4, totalFileSize=238324, scanRanges=4
|
+| partition=3/0
|
+| backends:
|
+| 10002
|
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=1/bucket-0/data-15cee5b7-1bd7-42ca-9314-56d92c62c03b-0.orc
start: 0 length: 66600 |
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=1/bucket-0/data-5d50255a-2215-4010-b976-d5dc656f3444-0.orc
start: 0 length: 44501 |
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=2/bucket-0/data-e98fb7ef-ec2b-4ad5-a496-713cb9481d56-0.orc
start: 0 length: 64059 |
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=0/bucket-0/data-431be05d-50fa-401f-9680-d646757d0f95-0.orc
start: 0 length: 63164 |
+| cardinality=18751, numNodes=1
|
+| pushdown agg=NONE
|
+| paimonNativeReadSplits=4/4
|
+| PaimonSplitStats:
|
+| SplitStat [type=NATIVE, rowCount=1542, rawFileConvertable=true,
hasDeletionVector=true] |
+| SplitStat [type=NATIVE, rowCount=750, rawFileConvertable=true,
hasDeletionVector=false] |
+| SplitStat [type=NATIVE, rowCount=750, rawFileConvertable=true,
hasDeletionVector=false] |
+| tuple ids: 0
+| ...............
|
|
++------------------------------------------------------------------------------------------------------------------------------------------------+
+67 rows in set (0.23 sec)
+```
+
+可以看到,对于刚才通过 Flink SQL 更新的表,包含 4 个分片,并且全部分片都可以通过 Native Reader
进行访问(paimonNativeReadSplits=4/4)。并且第一个分片的hasDeletionVector的属性为 true,表示该分片有对应的
Deletion Vector,读取时会根据 Deletion Vector 进行数据过滤。
+
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/get-starting/quick-start/doris-paimon.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/get-starting/quick-start/doris-paimon.md
new file mode 100644
index 00000000000..c46e7531e19
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/get-starting/quick-start/doris-paimon.md
@@ -0,0 +1,269 @@
+---
+{
+ "title": "Apache Doris & Paimon 快速开始",
+ "language": "zh-CN"
+}
+
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+作为一种全新的开放式的数据管理架构,湖仓一体(Data
Lakehouse)融合了数据仓库的高性能、实时性以及数据湖的低成本、灵活性等优势,帮助用户更加便捷地满足各种数据处理分析的需求,在企业的大数据体系中已经得到越来越多的应用。
+
+在过去多个版本中,Apache Doris 持续加深与数据湖的融合,当前已演进出一套成熟的湖仓一体解决方案。
+
+- 自 0.15 版本起,Apache Doris 引入 Hive 和 Iceberg 外部表,尝试在 Apache Iceberg
之上探索与数据湖的能力结合。
+- 自 1.2 版本起,Apache Doris 正式引入 Multi-Catalog
功能,实现了多种数据源的自动元数据映射和数据访问、并对外部数据读取和查询执行等方面做了诸多性能优化,完全具备了构建极速易用 Lakehouse 架构的能力。
+- 在 2.1 版本中,Apache Doris 湖仓一体架构得到全面加强,不仅增强了主流数据湖格式(Hudi、Iceberg、Paimon
等)的读取和写入能力,还引入了多 SQL 方言兼容、可从原有系统无缝切换至 Apache Doris。在数据科学及大规模数据读取场景上, Doris 集成了
Arrow Flight 高速读取接口,使得数据传输效率实现 100 倍的提升。
+
+
+
+## Apache Doris & Paimon
+
+Apache Paimon 是一种数据湖格式,并创新性地将数据湖格式和 LSM 结构的优势相结合,成功将高效的实时流更新能力引入数据湖架构中,这使得
Paimon 能够实现数据的高效管理和实时分析,为构建实时湖仓架构提供了强大的支撑。
+
+为了充分发挥 Paimon 的能力,提高对 Paimon 数据的查询效率,Apache Doris 对 Paimon 的多项最新特性提供了原生支持:
+
+- 支持 Hive Metastore、FileSystem 等多种类型的 Paimon Catalog。
+- 原生支持 Paimon 0.6 版本发布的 Primary Key Table Read Optimized 功能。
+- 原生支持 Paimon 0.8 版本发布的 Primary Key Table Deletion Vector 功能。
+
+基于 Apache Doris 的高性能查询引擎和 Apache Paimon 高效的实时流更新能力,用户可以实现:
+
+- 数据实时入湖:借助 Paimon 的 LSM-Tree 模型,数据入湖的时效性可以降低到分钟级;同时,Paimon
支持包括聚合、去重、部分列更新在内的多种数据更新能力,使得数据流动更加灵活高效。
+- 高性能数据处理分析:Paimon 所提供的 Append Only Table、Read Optimized、Deletion Vector
等技术,可与 Doris 强大的查询引擎对接,实现湖上数据的快速查询及分析响应。
+
+未来 Apache Doris 将会逐步支持包括 Time Travel、增量数据读取在内的 Apache Paimon
更多高级特性,共同构建统一、高性能、实时的湖仓平台。
+
+本文将会再 Docker 环境中,为读者讲解如何快速搭建 Apache Doris + Apache Paimon 测试 &
演示环境,并展示各功能的使用操作。
+
+关于更多说明,请参阅 [Paimon Catalog](../../lakehouse/datalake-analytics/paimon.md)
+
+## 使用指南
+
+本文涉及所有脚本和代码可以从该地址获取:[https://github.com/apache/doris/tree/master/samples/datalake/iceberg_and_paimon](https://github.com/apache/doris/tree/master/samples/datalake/iceberg_and_paimon)
+
+### 01 环境准备
+
+本文示例采用 Docker Compose 部署,组件及版本号如下:
+
+| 组件名称 | 版本 |
+| --- | --- |
+| Apache Doris | 默认 2.1.5,可修改 |
+| Apache Paimon | 0.8|
+| Apache Flink | 1.18|
+| MinIO | RELEASE.2024-04-29T09-56-05Z|
+
+### 02 环境部署
+
+1. 启动所有组件
+
+ `bash ./start_all.sh`
+
+2. 启动后,可以使用如下脚本,登陆 Flink 命令行或 Doris 命令行:
+
+ ```
+ -- login flink
+ bash ./start_flink_client.sh
+
+ -- login doris
+ bash ./start_doris_client.sh
+ ```
+
+### 03 数据准备
+
+首先登陆 Flink 命令行后,可以看到一张预构建的表。表中已经包含一些数据,我们可以通过 Flink SQL 进行查看。
+
+```
+Flink SQL> use paimon.db_paimon;
+[INFO] Execute statement succeed.
+
+Flink SQL> show tables;
++------------+
+| table name |
++------------+
+| customer |
++------------+
+1 row in set
+
+Flink SQL> show create table customer;
++------------------------------------------------------------------------+
+| result |
++------------------------------------------------------------------------+
+| CREATE TABLE `paimon`.`db_paimon`.`customer` (
+ `c_custkey` INT NOT NULL,
+ `c_name` VARCHAR(25),
+ `c_address` VARCHAR(40),
+ `c_nationkey` INT NOT NULL,
+ `c_phone` CHAR(15),
+ `c_acctbal` DECIMAL(12, 2),
+ `c_mktsegment` CHAR(10),
+ `c_comment` VARCHAR(117),
+ CONSTRAINT `PK_c_custkey_c_nationkey` PRIMARY KEY (`c_custkey`,
`c_nationkey`) NOT ENFORCED
+) PARTITIONED BY (`c_nationkey`)
+WITH (
+ 'bucket' = '1',
+ 'path' = 's3://warehouse/wh/db_paimon.db/customer',
+ 'deletion-vectors.enabled' = 'true'
+)
+ |
++-------------------------------------------------------------------------+
+1 row in set
+
+Flink SQL> desc customer;
++--------------+----------------+-------+-----------------------------+--------+-----------+
+| name | type | null | key | extras
| watermark |
++--------------+----------------+-------+-----------------------------+--------+-----------+
+| c_custkey | INT | FALSE | PRI(c_custkey, c_nationkey) |
| |
+| c_name | VARCHAR(25) | TRUE | |
| |
+| c_address | VARCHAR(40) | TRUE | |
| |
+| c_nationkey | INT | FALSE | PRI(c_custkey, c_nationkey) |
| |
+| c_phone | CHAR(15) | TRUE | |
| |
+| c_acctbal | DECIMAL(12, 2) | TRUE | |
| |
+| c_mktsegment | CHAR(10) | TRUE | |
| |
+| c_comment | VARCHAR(117) | TRUE | |
| |
++--------------+----------------+-------+-----------------------------+--------+-----------+
+8 rows in set
+
+Flink SQL> select * from customer order by c_custkey limit 4;
++-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+--------------------------------+
+| c_custkey | c_name | c_address |
c_nationkey | c_phone | c_acctbal | c_mktsegment |
c_comment |
++-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+--------------------------------+
+| 1 | Customer#000000001 | IVhzIApeRb ot,c,E |
15 | 25-989-741-2988 | 711.56 | BUILDING | to the even, regular
platel... |
+| 2 | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak |
13 | 23-768-687-3665 | 121.65 | AUTOMOBILE | l accounts. blithely
ironic... |
+| 3 | Customer#000000003 | MG9kdTD2WBHm |
1 | 11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly ironic,...
|
+| 32 | Customer#000000032 | jD2xZzi UmId,DCtNBLXKj9q0Tl... |
15 | 25-430-914-2194 | 3471.53 | BUILDING | cial ideas. final, furious
... |
++-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+--------------------------------+
+4 rows in set
+```
+
+### 04 数据查询
+
+如下所示,Doris 集群中已经创建了名为paimon 的 Catalog(可通过 SHOW CATALOGS 查看)。以下为该 Catalog 的创建语句:
+
+```
+-- 已创建,无需执行
+CREATE CATALOG `paimon` PROPERTIES (
+ "type" = "paimon",
+ "warehouse" = "s3://warehouse/wh/",
+ "s3.endpoint"="http://minio:9000",
+ "s3.access_key"="admin",
+ "s3.secret_key"="password",
+ "s3.region"="us-east-1"
+);
+```
+
+你可登录到 Doris 中查询 Paimon 的数据:
+
+```
+mysql> use paimon.db_paimon;
+Reading table information for completion of table and column names
+You can turn off this feature to get a quicker startup with -A
+
+Database changed
+mysql> show tables;
++---------------------+
+| Tables_in_db_paimon |
++---------------------+
+| customer |
++---------------------+
+1 row in set (0.00 sec)
+
+mysql> select * from customer order by c_custkey limit 4;
++-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| c_custkey | c_name | c_address |
c_nationkey | c_phone | c_acctbal | c_mktsegment | c_comment
|
++-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| 1 | Customer#000000001 | IVhzIApeRb ot,c,E |
15 | 25-989-741-2988 | 711.56 | BUILDING | to the even, regular
platelets. regular, ironic epitaphs nag e
|
+| 2 | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak |
13 | 23-768-687-3665 | 121.65 | AUTOMOBILE | l accounts. blithely
ironic theodolites integrate boldly: caref
|
+| 3 | Customer#000000003 | MG9kdTD2WBHm |
1 | 11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly
ironic, even instructions. express foxes detect slyly. blithely even accounts
abov |
+| 32 | Customer#000000032 | jD2xZzi UmId,DCtNBLXKj9q0Tlp2iQ6ZcO3J |
15 | 25-430-914-2194 | 3471.53 | BUILDING | cial ideas. final,
furious requests across the e
|
++-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+4 rows in set (1.89 sec)
+```
+
+### 05 读取增量数据
+
+我们可以通过 Flink SQL 更新 Paimon 表中的数据:
+
+```
+Flink SQL> update customer set c_address='c_address_update' where c_nationkey
= 1;
+[INFO] Submitting SQL update statement to the cluster...
+[INFO] SQL update statement has been successfully submitted to the cluster:
+Job ID: ff838b7b778a94396b332b0d93c8f7ac
+```
+
+等 Flink SQL 执行完毕后,在 Doris 中可直接查看到最新的数据:
+
+```
+mysql> select * from customer where c_nationkey=1 limit 2;
++-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| c_custkey | c_name | c_address | c_nationkey | c_phone
| c_acctbal | c_mktsegment | c_comment
|
++-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| 3 | Customer#000000003 | c_address_update | 1 |
11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly ironic, even
instructions. express foxes detect slyly. blithely even accounts abov |
+| 513 | Customer#000000513 | c_address_update | 1 |
11-861-303-6887 | 955.37 | HOUSEHOLD | press along the quickly regular
instructions. regular requests against the carefully ironic s |
++-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+2 rows in set (0.19 sec)
+```
+
+### Benchmark
+
+我们在 Paimon(0.8)版本的 TPCDS 1000 数据集上进行了简单的测试,分别使用了 Apache Doris 2.1.5 版本和 Trino
422 版本,均开启 Primary Key Table Read Optimized 功能。
+
+
+
+从测试结果可以看到,Doris 在标准静态测试集上的平均查询性能是 Trino 的 3 -5 倍。后续我们将针对 Deletion Vector
进行优化,进一步提升真实业务场景下的查询效率。
+
+## 查询优化
+
+对于基线数据来说,Apache Paimon 在 0.6 版本中引入 Primary Key Table Read Optimized
功能后,使得查询引擎可以直接访问底层的 Parquet/ORC 文件,大幅提升了基线数据的读取效率。对于尚未合并的增量数据( INSERT、UPDATE 或
DELETE 所产生的数据增量)来说,可以通过 Merge-on-Read 的方式进行读取。此外,Paimon 在 0.8 版本中还引入的 Deletion
Vector 功能,能够进一步提升查询引擎对增量数据的读取效率。
+Apache Doris 支持通过原生的 Reader 读取 Deletion Vector 并进行 Merge on Read,我们通过 Doris 的
EXPLAIN 语句,来演示在一个查询中,基线数据和增量数据的查询方式。
+
+```
+mysql> explain verbose select * from customer where c_nationkey < 3;
++------------------------------------------------------------------------------------------------------------------------------------------------+
+| Explain String(Nereids Planner)
|
++------------------------------------------------------------------------------------------------------------------------------------------------+
+| ...............
|
+|
|
+| 0:VPAIMON_SCAN_NODE(68)
|
+| table: customer
|
+| predicates: (c_nationkey[#3] < 3)
|
+| inputSplitNum=4, totalFileSize=238324, scanRanges=4
|
+| partition=3/0
|
+| backends:
|
+| 10002
|
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=1/bucket-0/data-15cee5b7-1bd7-42ca-9314-56d92c62c03b-0.orc
start: 0 length: 66600 |
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=1/bucket-0/data-5d50255a-2215-4010-b976-d5dc656f3444-0.orc
start: 0 length: 44501 |
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=2/bucket-0/data-e98fb7ef-ec2b-4ad5-a496-713cb9481d56-0.orc
start: 0 length: 64059 |
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=0/bucket-0/data-431be05d-50fa-401f-9680-d646757d0f95-0.orc
start: 0 length: 63164 |
+| cardinality=18751, numNodes=1
|
+| pushdown agg=NONE
|
+| paimonNativeReadSplits=4/4
|
+| PaimonSplitStats:
|
+| SplitStat [type=NATIVE, rowCount=1542, rawFileConvertable=true,
hasDeletionVector=true] |
+| SplitStat [type=NATIVE, rowCount=750, rawFileConvertable=true,
hasDeletionVector=false] |
+| SplitStat [type=NATIVE, rowCount=750, rawFileConvertable=true,
hasDeletionVector=false] |
+| tuple ids: 0
+| ...............
|
|
++------------------------------------------------------------------------------------------------------------------------------------------------+
+67 rows in set (0.23 sec)
+```
+
+可以看到,对于刚才通过 Flink SQL 更新的表,包含 4 个分片,并且全部分片都可以通过 Native Reader
进行访问(paimonNativeReadSplits=4/4)。并且第一个分片的hasDeletionVector的属性为 true,表示该分片有对应的
Deletion Vector,读取时会根据 Deletion Vector 进行数据过滤。
\ No newline at end of file
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/get-starting/quick-start/doris-paimon.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/get-starting/quick-start/doris-paimon.md
new file mode 100644
index 00000000000..c46e7531e19
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/get-starting/quick-start/doris-paimon.md
@@ -0,0 +1,269 @@
+---
+{
+ "title": "Apache Doris & Paimon 快速开始",
+ "language": "zh-CN"
+}
+
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+作为一种全新的开放式的数据管理架构,湖仓一体(Data
Lakehouse)融合了数据仓库的高性能、实时性以及数据湖的低成本、灵活性等优势,帮助用户更加便捷地满足各种数据处理分析的需求,在企业的大数据体系中已经得到越来越多的应用。
+
+在过去多个版本中,Apache Doris 持续加深与数据湖的融合,当前已演进出一套成熟的湖仓一体解决方案。
+
+- 自 0.15 版本起,Apache Doris 引入 Hive 和 Iceberg 外部表,尝试在 Apache Iceberg
之上探索与数据湖的能力结合。
+- 自 1.2 版本起,Apache Doris 正式引入 Multi-Catalog
功能,实现了多种数据源的自动元数据映射和数据访问、并对外部数据读取和查询执行等方面做了诸多性能优化,完全具备了构建极速易用 Lakehouse 架构的能力。
+- 在 2.1 版本中,Apache Doris 湖仓一体架构得到全面加强,不仅增强了主流数据湖格式(Hudi、Iceberg、Paimon
等)的读取和写入能力,还引入了多 SQL 方言兼容、可从原有系统无缝切换至 Apache Doris。在数据科学及大规模数据读取场景上, Doris 集成了
Arrow Flight 高速读取接口,使得数据传输效率实现 100 倍的提升。
+
+
+
+## Apache Doris & Paimon
+
+Apache Paimon 是一种数据湖格式,并创新性地将数据湖格式和 LSM 结构的优势相结合,成功将高效的实时流更新能力引入数据湖架构中,这使得
Paimon 能够实现数据的高效管理和实时分析,为构建实时湖仓架构提供了强大的支撑。
+
+为了充分发挥 Paimon 的能力,提高对 Paimon 数据的查询效率,Apache Doris 对 Paimon 的多项最新特性提供了原生支持:
+
+- 支持 Hive Metastore、FileSystem 等多种类型的 Paimon Catalog。
+- 原生支持 Paimon 0.6 版本发布的 Primary Key Table Read Optimized 功能。
+- 原生支持 Paimon 0.8 版本发布的 Primary Key Table Deletion Vector 功能。
+
+基于 Apache Doris 的高性能查询引擎和 Apache Paimon 高效的实时流更新能力,用户可以实现:
+
+- 数据实时入湖:借助 Paimon 的 LSM-Tree 模型,数据入湖的时效性可以降低到分钟级;同时,Paimon
支持包括聚合、去重、部分列更新在内的多种数据更新能力,使得数据流动更加灵活高效。
+- 高性能数据处理分析:Paimon 所提供的 Append Only Table、Read Optimized、Deletion Vector
等技术,可与 Doris 强大的查询引擎对接,实现湖上数据的快速查询及分析响应。
+
+未来 Apache Doris 将会逐步支持包括 Time Travel、增量数据读取在内的 Apache Paimon
更多高级特性,共同构建统一、高性能、实时的湖仓平台。
+
+本文将会再 Docker 环境中,为读者讲解如何快速搭建 Apache Doris + Apache Paimon 测试 &
演示环境,并展示各功能的使用操作。
+
+关于更多说明,请参阅 [Paimon Catalog](../../lakehouse/datalake-analytics/paimon.md)
+
+## 使用指南
+
+本文涉及所有脚本和代码可以从该地址获取:[https://github.com/apache/doris/tree/master/samples/datalake/iceberg_and_paimon](https://github.com/apache/doris/tree/master/samples/datalake/iceberg_and_paimon)
+
+### 01 环境准备
+
+本文示例采用 Docker Compose 部署,组件及版本号如下:
+
+| 组件名称 | 版本 |
+| --- | --- |
+| Apache Doris | 默认 2.1.5,可修改 |
+| Apache Paimon | 0.8|
+| Apache Flink | 1.18|
+| MinIO | RELEASE.2024-04-29T09-56-05Z|
+
+### 02 环境部署
+
+1. 启动所有组件
+
+ `bash ./start_all.sh`
+
+2. 启动后,可以使用如下脚本,登陆 Flink 命令行或 Doris 命令行:
+
+ ```
+ -- login flink
+ bash ./start_flink_client.sh
+
+ -- login doris
+ bash ./start_doris_client.sh
+ ```
+
+### 03 数据准备
+
+首先登陆 Flink 命令行后,可以看到一张预构建的表。表中已经包含一些数据,我们可以通过 Flink SQL 进行查看。
+
+```
+Flink SQL> use paimon.db_paimon;
+[INFO] Execute statement succeed.
+
+Flink SQL> show tables;
++------------+
+| table name |
++------------+
+| customer |
++------------+
+1 row in set
+
+Flink SQL> show create table customer;
++------------------------------------------------------------------------+
+| result |
++------------------------------------------------------------------------+
+| CREATE TABLE `paimon`.`db_paimon`.`customer` (
+ `c_custkey` INT NOT NULL,
+ `c_name` VARCHAR(25),
+ `c_address` VARCHAR(40),
+ `c_nationkey` INT NOT NULL,
+ `c_phone` CHAR(15),
+ `c_acctbal` DECIMAL(12, 2),
+ `c_mktsegment` CHAR(10),
+ `c_comment` VARCHAR(117),
+ CONSTRAINT `PK_c_custkey_c_nationkey` PRIMARY KEY (`c_custkey`,
`c_nationkey`) NOT ENFORCED
+) PARTITIONED BY (`c_nationkey`)
+WITH (
+ 'bucket' = '1',
+ 'path' = 's3://warehouse/wh/db_paimon.db/customer',
+ 'deletion-vectors.enabled' = 'true'
+)
+ |
++-------------------------------------------------------------------------+
+1 row in set
+
+Flink SQL> desc customer;
++--------------+----------------+-------+-----------------------------+--------+-----------+
+| name | type | null | key | extras
| watermark |
++--------------+----------------+-------+-----------------------------+--------+-----------+
+| c_custkey | INT | FALSE | PRI(c_custkey, c_nationkey) |
| |
+| c_name | VARCHAR(25) | TRUE | |
| |
+| c_address | VARCHAR(40) | TRUE | |
| |
+| c_nationkey | INT | FALSE | PRI(c_custkey, c_nationkey) |
| |
+| c_phone | CHAR(15) | TRUE | |
| |
+| c_acctbal | DECIMAL(12, 2) | TRUE | |
| |
+| c_mktsegment | CHAR(10) | TRUE | |
| |
+| c_comment | VARCHAR(117) | TRUE | |
| |
++--------------+----------------+-------+-----------------------------+--------+-----------+
+8 rows in set
+
+Flink SQL> select * from customer order by c_custkey limit 4;
++-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+--------------------------------+
+| c_custkey | c_name | c_address |
c_nationkey | c_phone | c_acctbal | c_mktsegment |
c_comment |
++-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+--------------------------------+
+| 1 | Customer#000000001 | IVhzIApeRb ot,c,E |
15 | 25-989-741-2988 | 711.56 | BUILDING | to the even, regular
platel... |
+| 2 | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak |
13 | 23-768-687-3665 | 121.65 | AUTOMOBILE | l accounts. blithely
ironic... |
+| 3 | Customer#000000003 | MG9kdTD2WBHm |
1 | 11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly ironic,...
|
+| 32 | Customer#000000032 | jD2xZzi UmId,DCtNBLXKj9q0Tl... |
15 | 25-430-914-2194 | 3471.53 | BUILDING | cial ideas. final, furious
... |
++-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+--------------------------------+
+4 rows in set
+```
+
+### 04 数据查询
+
+如下所示,Doris 集群中已经创建了名为paimon 的 Catalog(可通过 SHOW CATALOGS 查看)。以下为该 Catalog 的创建语句:
+
+```
+-- 已创建,无需执行
+CREATE CATALOG `paimon` PROPERTIES (
+ "type" = "paimon",
+ "warehouse" = "s3://warehouse/wh/",
+ "s3.endpoint"="http://minio:9000",
+ "s3.access_key"="admin",
+ "s3.secret_key"="password",
+ "s3.region"="us-east-1"
+);
+```
+
+你可登录到 Doris 中查询 Paimon 的数据:
+
+```
+mysql> use paimon.db_paimon;
+Reading table information for completion of table and column names
+You can turn off this feature to get a quicker startup with -A
+
+Database changed
+mysql> show tables;
++---------------------+
+| Tables_in_db_paimon |
++---------------------+
+| customer |
++---------------------+
+1 row in set (0.00 sec)
+
+mysql> select * from customer order by c_custkey limit 4;
++-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| c_custkey | c_name | c_address |
c_nationkey | c_phone | c_acctbal | c_mktsegment | c_comment
|
++-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| 1 | Customer#000000001 | IVhzIApeRb ot,c,E |
15 | 25-989-741-2988 | 711.56 | BUILDING | to the even, regular
platelets. regular, ironic epitaphs nag e
|
+| 2 | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak |
13 | 23-768-687-3665 | 121.65 | AUTOMOBILE | l accounts. blithely
ironic theodolites integrate boldly: caref
|
+| 3 | Customer#000000003 | MG9kdTD2WBHm |
1 | 11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly
ironic, even instructions. express foxes detect slyly. blithely even accounts
abov |
+| 32 | Customer#000000032 | jD2xZzi UmId,DCtNBLXKj9q0Tlp2iQ6ZcO3J |
15 | 25-430-914-2194 | 3471.53 | BUILDING | cial ideas. final,
furious requests across the e
|
++-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+4 rows in set (1.89 sec)
+```
+
+### 05 读取增量数据
+
+我们可以通过 Flink SQL 更新 Paimon 表中的数据:
+
+```
+Flink SQL> update customer set c_address='c_address_update' where c_nationkey
= 1;
+[INFO] Submitting SQL update statement to the cluster...
+[INFO] SQL update statement has been successfully submitted to the cluster:
+Job ID: ff838b7b778a94396b332b0d93c8f7ac
+```
+
+等 Flink SQL 执行完毕后,在 Doris 中可直接查看到最新的数据:
+
+```
+mysql> select * from customer where c_nationkey=1 limit 2;
++-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| c_custkey | c_name | c_address | c_nationkey | c_phone
| c_acctbal | c_mktsegment | c_comment
|
++-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| 3 | Customer#000000003 | c_address_update | 1 |
11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly ironic, even
instructions. express foxes detect slyly. blithely even accounts abov |
+| 513 | Customer#000000513 | c_address_update | 1 |
11-861-303-6887 | 955.37 | HOUSEHOLD | press along the quickly regular
instructions. regular requests against the carefully ironic s |
++-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+2 rows in set (0.19 sec)
+```
+
+### Benchmark
+
+我们在 Paimon(0.8)版本的 TPCDS 1000 数据集上进行了简单的测试,分别使用了 Apache Doris 2.1.5 版本和 Trino
422 版本,均开启 Primary Key Table Read Optimized 功能。
+
+
+
+从测试结果可以看到,Doris 在标准静态测试集上的平均查询性能是 Trino 的 3 -5 倍。后续我们将针对 Deletion Vector
进行优化,进一步提升真实业务场景下的查询效率。
+
+## 查询优化
+
+对于基线数据来说,Apache Paimon 在 0.6 版本中引入 Primary Key Table Read Optimized
功能后,使得查询引擎可以直接访问底层的 Parquet/ORC 文件,大幅提升了基线数据的读取效率。对于尚未合并的增量数据( INSERT、UPDATE 或
DELETE 所产生的数据增量)来说,可以通过 Merge-on-Read 的方式进行读取。此外,Paimon 在 0.8 版本中还引入的 Deletion
Vector 功能,能够进一步提升查询引擎对增量数据的读取效率。
+Apache Doris 支持通过原生的 Reader 读取 Deletion Vector 并进行 Merge on Read,我们通过 Doris 的
EXPLAIN 语句,来演示在一个查询中,基线数据和增量数据的查询方式。
+
+```
+mysql> explain verbose select * from customer where c_nationkey < 3;
++------------------------------------------------------------------------------------------------------------------------------------------------+
+| Explain String(Nereids Planner)
|
++------------------------------------------------------------------------------------------------------------------------------------------------+
+| ...............
|
+|
|
+| 0:VPAIMON_SCAN_NODE(68)
|
+| table: customer
|
+| predicates: (c_nationkey[#3] < 3)
|
+| inputSplitNum=4, totalFileSize=238324, scanRanges=4
|
+| partition=3/0
|
+| backends:
|
+| 10002
|
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=1/bucket-0/data-15cee5b7-1bd7-42ca-9314-56d92c62c03b-0.orc
start: 0 length: 66600 |
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=1/bucket-0/data-5d50255a-2215-4010-b976-d5dc656f3444-0.orc
start: 0 length: 44501 |
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=2/bucket-0/data-e98fb7ef-ec2b-4ad5-a496-713cb9481d56-0.orc
start: 0 length: 64059 |
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=0/bucket-0/data-431be05d-50fa-401f-9680-d646757d0f95-0.orc
start: 0 length: 63164 |
+| cardinality=18751, numNodes=1
|
+| pushdown agg=NONE
|
+| paimonNativeReadSplits=4/4
|
+| PaimonSplitStats:
|
+| SplitStat [type=NATIVE, rowCount=1542, rawFileConvertable=true,
hasDeletionVector=true] |
+| SplitStat [type=NATIVE, rowCount=750, rawFileConvertable=true,
hasDeletionVector=false] |
+| SplitStat [type=NATIVE, rowCount=750, rawFileConvertable=true,
hasDeletionVector=false] |
+| tuple ids: 0
+| ...............
|
|
++------------------------------------------------------------------------------------------------------------------------------------------------+
+67 rows in set (0.23 sec)
+```
+
+可以看到,对于刚才通过 Flink SQL 更新的表,包含 4 个分片,并且全部分片都可以通过 Native Reader
进行访问(paimonNativeReadSplits=4/4)。并且第一个分片的hasDeletionVector的属性为 true,表示该分片有对应的
Deletion Vector,读取时会根据 Deletion Vector 进行数据过滤。
\ No newline at end of file
diff --git a/sidebars.json b/sidebars.json
index 5ab70c0cb21..72d6884ac48 100644
--- a/sidebars.json
+++ b/sidebars.json
@@ -10,7 +10,8 @@
"label": "Quick Start",
"items": [
"get-starting/quick-start/quick-start",
- "get-starting/quick-start/doris-hudi"
+ "get-starting/quick-start/doris-hudi",
+ "get-starting/quick-start/doris-paimon"
]
}
]
diff --git a/static/images/quick-start/lakehouse-paimon-arch.jpeg
b/static/images/quick-start/lakehouse-paimon-arch.jpeg
new file mode 100644
index 00000000000..91caf1194de
Binary files /dev/null and
b/static/images/quick-start/lakehouse-paimon-arch.jpeg differ
diff --git a/static/images/quick-start/lakehouse-paimon-benchmark.PNG
b/static/images/quick-start/lakehouse-paimon-benchmark.PNG
new file mode 100644
index 00000000000..8fe32925816
Binary files /dev/null and
b/static/images/quick-start/lakehouse-paimon-benchmark.PNG differ
diff --git
a/versioned_docs/version-2.1/get-starting/quick-start/doris-paimon.md
b/versioned_docs/version-2.1/get-starting/quick-start/doris-paimon.md
new file mode 100644
index 00000000000..cf1956c6e2d
--- /dev/null
+++ b/versioned_docs/version-2.1/get-starting/quick-start/doris-paimon.md
@@ -0,0 +1,270 @@
+---
+{
+ "title": "Apache Doris & Paimon Quick Start",
+ "language": "en"
+}
+
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+As a new open data management architecture, the Data Lakehouse integrates the
high performance and real-time capabilities of data warehouses with the low
cost and flexibility of data lakes, helping users more conveniently meet
various data processing and analysis needs. It has been increasingly applied in
enterprise big data systems.
+
+In recent versions, Apache Doris has deepened its integration with data lakes
and has evolved a mature Data Lakehouse solution.
+
+- Since version 0.15, Apache Doris has introduced Hive and Iceberg external
tables, exploring the capabilities of combining with Apache Iceberg for data
lakes.
+- Starting from version 1.2, Apache Doris officially introduced the
Multi-Catalog feature, achieving automatic metadata mapping and data access for
various data sources, along with many performance optimizations for external
data reading and query execution. It now fully possesses the ability to build a
high-speed and user-friendly Lakehouse architecture.
+- In version 2.1, Apache Doris' Data Lakehouse architecture was significantly
enhanced, strengthening the reading and writing capabilities of mainstream data
lake formats (Hudi, Iceberg, Paimon, etc.), introducing compatibility with
multiple SQL dialects, and seamless migration from existing systems to Apache
Doris. For data science and large-scale data reading scenarios, Doris
integrated the Arrow Flight high-speed reading interface, achieving a 100-fold
improvement in data transfer eff [...]
+
+
+
+## Apache Doris & Paimon
+
+Apache Paimon is a data lake format that innovatively combines the advantages
of data lake formats and LSM structures, successfully introducing efficient
real-time streaming update capabilities into data lake architecture. This
enables Paimon to efficiently manage data and perform real-time analysis,
providing strong support for building real-time Data Lakehouse architecture.
+
+To fully leverage Paimon's capabilities and improve query efficiency for
Paimon data, Apache Doris provides native support for several of Paimon's
latest features:
+
+- Supports various types of Paimon Catalogs such as Hive Metastore and
FileSystem.
+- Native support for Paimon 0.6's Primary Key Table Read Optimized feature.
+- Native support for Paimon 0.8's Primary Key Table Deletion Vector feature.
+
+With Apache Doris' high-performance query engine and Apache Paimon's efficient
real-time streaming update capabilities, users can achieve:
+
+- Real-time data ingestion into the lake: Leveraging Paimon's LSM-Tree model,
data ingestion into the lake can be reduced to a minute-level timeliness.
Additionally, Paimon supports various data update capabilities including
aggregation, deduplication, and partial column updates, making data flow more
flexible and efficient.
+- High-performance data processing and analysis: Paimon's technologies such as
Append Only Table, Read Optimized, and Deletion Vector can be seamlessly
integrated with Doris' powerful query engine, enabling fast querying and
analysis responses for lake data.
+
+In the future, Apache Doris will gradually support more advanced features of
Apache Paimon, including Time Travel and incremental data reading, to jointly
build a unified, high-performance, real-time lakehouse platform.
+
+This article will explain how to quickly set up an Apache Doris + Apache
Paimon testing & demonstration environment in a Docker environment and
demonstrate the usage of various features.
+
+For more information, please refer to [Paimon
Catalog](../../lakehouse/datalake-analytics/paimon.md)
+
+## User Guide
+
+All scripts and code mentioned in this article can be obtained from the
following address:
[https://github.com/apache/doris/tree/master/samples/datalake/iceberg_and_paimon](https://github.com/apache/doris/tree/master/samples/datalake/iceberg_and_paimon)
+
+### 01 Environment Preparation
+
+This article uses Docker Compose for deployment, with the following components
and versions:
+
+| Component | Version |
+| --- | --- |
+| Apache Doris | Default 2.1.5, can be modified |
+| Apache Paimon | 0.8 |
+| Apache Flink | 1.18 |
+| MinIO | RELEASE.2024-04-29T09-56-05Z |
+
+### 02 Environment Deployment
+
+1. Start all components
+
+ `bash ./start_all.sh`
+
+2. After starting, you can use the following scripts to log in to the Flink
command line or Doris command line:
+
+ ```
+ -- login flink
+ bash ./start_flink_client.sh
+
+ -- login doris
+ bash ./start_doris_client.sh
+ ```
+
+### 03 Data Preparation
+
+After logging into the Flink command line, you can see a pre-built table. The
table already contains some data that can be viewed using Flink SQL.
+
+```
+Flink SQL> use paimon.db_paimon;
+[INFO] Execute statement succeed.
+
+Flink SQL> show tables;
++------------+
+| table name |
++------------+
+| customer |
++------------+
+1 row in set
+
+Flink SQL> show create table customer;
++------------------------------------------------------------------------+
+| result |
++------------------------------------------------------------------------+
+| CREATE TABLE `paimon`.`db_paimon`.`customer` (
+ `c_custkey` INT NOT NULL,
+ `c_name` VARCHAR(25),
+ `c_address` VARCHAR(40),
+ `c_nationkey` INT NOT NULL,
+ `c_phone` CHAR(15),
+ `c_acctbal` DECIMAL(12, 2),
+ `c_mktsegment` CHAR(10),
+ `c_comment` VARCHAR(117),
+ CONSTRAINT `PK_c_custkey_c_nationkey` PRIMARY KEY (`c_custkey`,
`c_nationkey`) NOT ENFORCED
+) PARTITIONED BY (`c_nationkey`)
+WITH (
+ 'bucket' = '1',
+ 'path' = 's3://warehouse/wh/db_paimon.db/customer',
+ 'deletion-vectors.enabled' = 'true'
+)
+ |
++-------------------------------------------------------------------------+
+1 row in set
+
+Flink SQL> desc customer;
++--------------+----------------+-------+-----------------------------+--------+-----------+
+| name | type | null | key | extras
| watermark |
++--------------+----------------+-------+-----------------------------+--------+-----------+
+| c_custkey | INT | FALSE | PRI(c_custkey, c_nationkey) |
| |
+| c_name | VARCHAR(25) | TRUE | |
| |
+| c_address | VARCHAR(40) | TRUE | |
| |
+| c_nationkey | INT | FALSE | PRI(c_custkey, c_nationkey) |
| |
+| c_phone | CHAR(15) | TRUE | |
| |
+| c_acctbal | DECIMAL(12, 2) | TRUE | |
| |
+| c_mktsegment | CHAR(10) | TRUE | |
| |
+| c_comment | VARCHAR(117) | TRUE | |
| |
++--------------+----------------+-------+-----------------------------+--------+-----------+
+8 rows in set
+
+Flink SQL> select * from customer order by c_custkey limit 4;
++-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+--------------------------------+
+| c_custkey | c_name | c_address |
c_nationkey | c_phone | c_acctbal | c_mktsegment |
c_comment |
++-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+--------------------------------+
+| 1 | Customer#000000001 | IVhzIApeRb ot,c,E |
15 | 25-989-741-2988 | 711.56 | BUILDING | to the even, regular
platel... |
+| 2 | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak |
13 | 23-768-687-3665 | 121.65 | AUTOMOBILE | l accounts. blithely
ironic... |
+| 3 | Customer#000000003 | MG9kdTD2WBHm |
1 | 11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly ironic,...
|
+| 32 | Customer#000000032 | jD2xZzi UmId,DCtNBLXKj9q0Tl... |
15 | 25-430-914-2194 | 3471.53 | BUILDING | cial ideas. final, furious
... |
++-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+--------------------------------+
+4 rows in set
+```
+
+### 04 Data Query
+
+As shown below, a Catalog named `paimon` has been created in the Doris cluster
(can be viewed using SHOW CATALOGS). The following is the statement for
creating this Catalog:
+
+```
+-- 已创建,无需执行
+CREATE CATALOG `paimon` PROPERTIES (
+ "type" = "paimon",
+ "warehouse" = "s3://warehouse/wh/",
+ "s3.endpoint"="http://minio:9000",
+ "s3.access_key"="admin",
+ "s3.secret_key"="password",
+ "s3.region"="us-east-1"
+);
+```
+
+You can query Paimon's data in Doris:
+
+```
+mysql> use paimon.db_paimon;
+Reading table information for completion of table and column names
+You can turn off this feature to get a quicker startup with -A
+
+Database changed
+mysql> show tables;
++---------------------+
+| Tables_in_db_paimon |
++---------------------+
+| customer |
++---------------------+
+1 row in set (0.00 sec)
+
+mysql> select * from customer order by c_custkey limit 4;
++-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| c_custkey | c_name | c_address |
c_nationkey | c_phone | c_acctbal | c_mktsegment | c_comment
|
++-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| 1 | Customer#000000001 | IVhzIApeRb ot,c,E |
15 | 25-989-741-2988 | 711.56 | BUILDING | to the even, regular
platelets. regular, ironic epitaphs nag e
|
+| 2 | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak |
13 | 23-768-687-3665 | 121.65 | AUTOMOBILE | l accounts. blithely
ironic theodolites integrate boldly: caref
|
+| 3 | Customer#000000003 | MG9kdTD2WBHm |
1 | 11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly
ironic, even instructions. express foxes detect slyly. blithely even accounts
abov |
+| 32 | Customer#000000032 | jD2xZzi UmId,DCtNBLXKj9q0Tlp2iQ6ZcO3J |
15 | 25-430-914-2194 | 3471.53 | BUILDING | cial ideas. final,
furious requests across the e
|
++-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+4 rows in set (1.89 sec)
+```
+
+### 05 Read Incremental Data
+
+You can update the data in the Paimon table using Flink SQL:
+
+```
+Flink SQL> update customer set c_address='c_address_update' where c_nationkey
= 1;
+[INFO] Submitting SQL update statement to the cluster...
+[INFO] SQL update statement has been successfully submitted to the cluster:
+Job ID: ff838b7b778a94396b332b0d93c8f7ac
+```
+
+After the Flink SQL execution is complete, you can directly view the latest
data in Doris:
+
+```
+mysql> select * from customer where c_nationkey=1 limit 2;
++-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| c_custkey | c_name | c_address | c_nationkey | c_phone
| c_acctbal | c_mktsegment | c_comment
|
++-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| 3 | Customer#000000003 | c_address_update | 1 |
11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly ironic, even
instructions. express foxes detect slyly. blithely even accounts abov |
+| 513 | Customer#000000513 | c_address_update | 1 |
11-861-303-6887 | 955.37 | HOUSEHOLD | press along the quickly regular
instructions. regular requests against the carefully ironic s |
++-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+2 rows in set (0.19 sec)
+```
+
+### Benchmark
+
+We conducted a simple test on the TPCDS 1000 dataset in Paimon (0.8) version,
using Apache Doris 2.1.5 version and Trino 422 version, both with the Primary
Key Table Read Optimized feature enabled.
+
+
+
+From the test results, it can be seen that Doris's average query performance
on the standard static test set is 3-5 times that of Trino. In the future, we
will optimize the Deletion Vector to further improve query efficiency in real
business scenarios.
+
+## Query Optimization
+
+For baseline data, after introducing the Primary Key Table Read Optimized
feature in Apache Paimon version 0.6, the query engine can directly access the
underlying Parquet/ORC files, significantly improving the reading efficiency of
baseline data. For unmerged incremental data (data increments generated by
INSERT, UPDATE, or DELETE), they can be read through Merge-on-Read. In
addition, Paimon introduced the Deletion Vector feature in version 0.8, which
further enhances the query engine's [...]
+Apache Doris supports reading Deletion Vector through native Reader and
performing Merge on Read. We demonstrate the query methods for baseline data
and incremental data in a query using Doris's EXPLAIN statement.
+
+```
+mysql> explain verbose select * from customer where c_nationkey < 3;
++------------------------------------------------------------------------------------------------------------------------------------------------+
+| Explain String(Nereids Planner)
|
++------------------------------------------------------------------------------------------------------------------------------------------------+
+| ...............
|
+|
|
+| 0:VPAIMON_SCAN_NODE(68)
|
+| table: customer
|
+| predicates: (c_nationkey[#3] < 3)
|
+| inputSplitNum=4, totalFileSize=238324, scanRanges=4
|
+| partition=3/0
|
+| backends:
|
+| 10002
|
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=1/bucket-0/data-15cee5b7-1bd7-42ca-9314-56d92c62c03b-0.orc
start: 0 length: 66600 |
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=1/bucket-0/data-5d50255a-2215-4010-b976-d5dc656f3444-0.orc
start: 0 length: 44501 |
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=2/bucket-0/data-e98fb7ef-ec2b-4ad5-a496-713cb9481d56-0.orc
start: 0 length: 64059 |
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=0/bucket-0/data-431be05d-50fa-401f-9680-d646757d0f95-0.orc
start: 0 length: 63164 |
+| cardinality=18751, numNodes=1
|
+| pushdown agg=NONE
|
+| paimonNativeReadSplits=4/4
|
+| PaimonSplitStats:
|
+| SplitStat [type=NATIVE, rowCount=1542, rawFileConvertable=true,
hasDeletionVector=true] |
+| SplitStat [type=NATIVE, rowCount=750, rawFileConvertable=true,
hasDeletionVector=false] |
+| SplitStat [type=NATIVE, rowCount=750, rawFileConvertable=true,
hasDeletionVector=false] |
+| tuple ids: 0
+| ...............
|
|
++------------------------------------------------------------------------------------------------------------------------------------------------+
+67 rows in set (0.23 sec)
+```
+
+It can be seen that the table just updated by Flink SQL contains 4 shards, and
all shards can be accessed through Native Reader (paimonNativeReadSplits=4/4).
In addition, the hasDeletionVector property of the first shard is true,
indicating that the shard has a corresponding Deletion Vector, and data will be
filtered according to the Deletion Vector when reading.
+
diff --git
a/versioned_docs/version-3.0/get-starting/quick-start/doris-paimon.md
b/versioned_docs/version-3.0/get-starting/quick-start/doris-paimon.md
new file mode 100644
index 00000000000..cf1956c6e2d
--- /dev/null
+++ b/versioned_docs/version-3.0/get-starting/quick-start/doris-paimon.md
@@ -0,0 +1,270 @@
+---
+{
+ "title": "Apache Doris & Paimon Quick Start",
+ "language": "en"
+}
+
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+As a new open data management architecture, the Data Lakehouse integrates the
high performance and real-time capabilities of data warehouses with the low
cost and flexibility of data lakes, helping users more conveniently meet
various data processing and analysis needs. It has been increasingly applied in
enterprise big data systems.
+
+In recent versions, Apache Doris has deepened its integration with data lakes
and has evolved a mature Data Lakehouse solution.
+
+- Since version 0.15, Apache Doris has introduced Hive and Iceberg external
tables, exploring the capabilities of combining with Apache Iceberg for data
lakes.
+- Starting from version 1.2, Apache Doris officially introduced the
Multi-Catalog feature, achieving automatic metadata mapping and data access for
various data sources, along with many performance optimizations for external
data reading and query execution. It now fully possesses the ability to build a
high-speed and user-friendly Lakehouse architecture.
+- In version 2.1, Apache Doris' Data Lakehouse architecture was significantly
enhanced, strengthening the reading and writing capabilities of mainstream data
lake formats (Hudi, Iceberg, Paimon, etc.), introducing compatibility with
multiple SQL dialects, and seamless migration from existing systems to Apache
Doris. For data science and large-scale data reading scenarios, Doris
integrated the Arrow Flight high-speed reading interface, achieving a 100-fold
improvement in data transfer eff [...]
+
+
+
+## Apache Doris & Paimon
+
+Apache Paimon is a data lake format that innovatively combines the advantages
of data lake formats and LSM structures, successfully introducing efficient
real-time streaming update capabilities into data lake architecture. This
enables Paimon to efficiently manage data and perform real-time analysis,
providing strong support for building real-time Data Lakehouse architecture.
+
+To fully leverage Paimon's capabilities and improve query efficiency for
Paimon data, Apache Doris provides native support for several of Paimon's
latest features:
+
+- Supports various types of Paimon Catalogs such as Hive Metastore and
FileSystem.
+- Native support for Paimon 0.6's Primary Key Table Read Optimized feature.
+- Native support for Paimon 0.8's Primary Key Table Deletion Vector feature.
+
+With Apache Doris' high-performance query engine and Apache Paimon's efficient
real-time streaming update capabilities, users can achieve:
+
+- Real-time data ingestion into the lake: Leveraging Paimon's LSM-Tree model,
data ingestion into the lake can be reduced to a minute-level timeliness.
Additionally, Paimon supports various data update capabilities including
aggregation, deduplication, and partial column updates, making data flow more
flexible and efficient.
+- High-performance data processing and analysis: Paimon's technologies such as
Append Only Table, Read Optimized, and Deletion Vector can be seamlessly
integrated with Doris' powerful query engine, enabling fast querying and
analysis responses for lake data.
+
+In the future, Apache Doris will gradually support more advanced features of
Apache Paimon, including Time Travel and incremental data reading, to jointly
build a unified, high-performance, real-time lakehouse platform.
+
+This article will explain how to quickly set up an Apache Doris + Apache
Paimon testing & demonstration environment in a Docker environment and
demonstrate the usage of various features.
+
+For more information, please refer to [Paimon
Catalog](../../lakehouse/datalake-analytics/paimon.md)
+
+## User Guide
+
+All scripts and code mentioned in this article can be obtained from the
following address:
[https://github.com/apache/doris/tree/master/samples/datalake/iceberg_and_paimon](https://github.com/apache/doris/tree/master/samples/datalake/iceberg_and_paimon)
+
+### 01 Environment Preparation
+
+This article uses Docker Compose for deployment, with the following components
and versions:
+
+| Component | Version |
+| --- | --- |
+| Apache Doris | Default 2.1.5, can be modified |
+| Apache Paimon | 0.8 |
+| Apache Flink | 1.18 |
+| MinIO | RELEASE.2024-04-29T09-56-05Z |
+
+### 02 Environment Deployment
+
+1. Start all components
+
+ `bash ./start_all.sh`
+
+2. After starting, you can use the following scripts to log in to the Flink
command line or Doris command line:
+
+ ```
+ -- login flink
+ bash ./start_flink_client.sh
+
+ -- login doris
+ bash ./start_doris_client.sh
+ ```
+
+### 03 Data Preparation
+
+After logging into the Flink command line, you can see a pre-built table. The
table already contains some data that can be viewed using Flink SQL.
+
+```
+Flink SQL> use paimon.db_paimon;
+[INFO] Execute statement succeed.
+
+Flink SQL> show tables;
++------------+
+| table name |
++------------+
+| customer |
++------------+
+1 row in set
+
+Flink SQL> show create table customer;
++------------------------------------------------------------------------+
+| result |
++------------------------------------------------------------------------+
+| CREATE TABLE `paimon`.`db_paimon`.`customer` (
+ `c_custkey` INT NOT NULL,
+ `c_name` VARCHAR(25),
+ `c_address` VARCHAR(40),
+ `c_nationkey` INT NOT NULL,
+ `c_phone` CHAR(15),
+ `c_acctbal` DECIMAL(12, 2),
+ `c_mktsegment` CHAR(10),
+ `c_comment` VARCHAR(117),
+ CONSTRAINT `PK_c_custkey_c_nationkey` PRIMARY KEY (`c_custkey`,
`c_nationkey`) NOT ENFORCED
+) PARTITIONED BY (`c_nationkey`)
+WITH (
+ 'bucket' = '1',
+ 'path' = 's3://warehouse/wh/db_paimon.db/customer',
+ 'deletion-vectors.enabled' = 'true'
+)
+ |
++-------------------------------------------------------------------------+
+1 row in set
+
+Flink SQL> desc customer;
++--------------+----------------+-------+-----------------------------+--------+-----------+
+| name | type | null | key | extras
| watermark |
++--------------+----------------+-------+-----------------------------+--------+-----------+
+| c_custkey | INT | FALSE | PRI(c_custkey, c_nationkey) |
| |
+| c_name | VARCHAR(25) | TRUE | |
| |
+| c_address | VARCHAR(40) | TRUE | |
| |
+| c_nationkey | INT | FALSE | PRI(c_custkey, c_nationkey) |
| |
+| c_phone | CHAR(15) | TRUE | |
| |
+| c_acctbal | DECIMAL(12, 2) | TRUE | |
| |
+| c_mktsegment | CHAR(10) | TRUE | |
| |
+| c_comment | VARCHAR(117) | TRUE | |
| |
++--------------+----------------+-------+-----------------------------+--------+-----------+
+8 rows in set
+
+Flink SQL> select * from customer order by c_custkey limit 4;
++-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+--------------------------------+
+| c_custkey | c_name | c_address |
c_nationkey | c_phone | c_acctbal | c_mktsegment |
c_comment |
++-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+--------------------------------+
+| 1 | Customer#000000001 | IVhzIApeRb ot,c,E |
15 | 25-989-741-2988 | 711.56 | BUILDING | to the even, regular
platel... |
+| 2 | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak |
13 | 23-768-687-3665 | 121.65 | AUTOMOBILE | l accounts. blithely
ironic... |
+| 3 | Customer#000000003 | MG9kdTD2WBHm |
1 | 11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly ironic,...
|
+| 32 | Customer#000000032 | jD2xZzi UmId,DCtNBLXKj9q0Tl... |
15 | 25-430-914-2194 | 3471.53 | BUILDING | cial ideas. final, furious
... |
++-----------+--------------------+--------------------------------+-------------+-----------------+-----------+--------------+--------------------------------+
+4 rows in set
+```
+
+### 04 Data Query
+
+As shown below, a Catalog named `paimon` has been created in the Doris cluster
(can be viewed using SHOW CATALOGS). The following is the statement for
creating this Catalog:
+
+```
+-- 已创建,无需执行
+CREATE CATALOG `paimon` PROPERTIES (
+ "type" = "paimon",
+ "warehouse" = "s3://warehouse/wh/",
+ "s3.endpoint"="http://minio:9000",
+ "s3.access_key"="admin",
+ "s3.secret_key"="password",
+ "s3.region"="us-east-1"
+);
+```
+
+You can query Paimon's data in Doris:
+
+```
+mysql> use paimon.db_paimon;
+Reading table information for completion of table and column names
+You can turn off this feature to get a quicker startup with -A
+
+Database changed
+mysql> show tables;
++---------------------+
+| Tables_in_db_paimon |
++---------------------+
+| customer |
++---------------------+
+1 row in set (0.00 sec)
+
+mysql> select * from customer order by c_custkey limit 4;
++-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| c_custkey | c_name | c_address |
c_nationkey | c_phone | c_acctbal | c_mktsegment | c_comment
|
++-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| 1 | Customer#000000001 | IVhzIApeRb ot,c,E |
15 | 25-989-741-2988 | 711.56 | BUILDING | to the even, regular
platelets. regular, ironic epitaphs nag e
|
+| 2 | Customer#000000002 | XSTf4,NCwDVaWNe6tEgvwfmRchLXak |
13 | 23-768-687-3665 | 121.65 | AUTOMOBILE | l accounts. blithely
ironic theodolites integrate boldly: caref
|
+| 3 | Customer#000000003 | MG9kdTD2WBHm |
1 | 11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly
ironic, even instructions. express foxes detect slyly. blithely even accounts
abov |
+| 32 | Customer#000000032 | jD2xZzi UmId,DCtNBLXKj9q0Tlp2iQ6ZcO3J |
15 | 25-430-914-2194 | 3471.53 | BUILDING | cial ideas. final,
furious requests across the e
|
++-----------+--------------------+---------------------------------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+4 rows in set (1.89 sec)
+```
+
+### 05 Read Incremental Data
+
+You can update the data in the Paimon table using Flink SQL:
+
+```
+Flink SQL> update customer set c_address='c_address_update' where c_nationkey
= 1;
+[INFO] Submitting SQL update statement to the cluster...
+[INFO] SQL update statement has been successfully submitted to the cluster:
+Job ID: ff838b7b778a94396b332b0d93c8f7ac
+```
+
+After the Flink SQL execution is complete, you can directly view the latest
data in Doris:
+
+```
+mysql> select * from customer where c_nationkey=1 limit 2;
++-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| c_custkey | c_name | c_address | c_nationkey | c_phone
| c_acctbal | c_mktsegment | c_comment
|
++-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+| 3 | Customer#000000003 | c_address_update | 1 |
11-719-748-3364 | 7498.12 | AUTOMOBILE | deposits eat slyly ironic, even
instructions. express foxes detect slyly. blithely even accounts abov |
+| 513 | Customer#000000513 | c_address_update | 1 |
11-861-303-6887 | 955.37 | HOUSEHOLD | press along the quickly regular
instructions. regular requests against the carefully ironic s |
++-----------+--------------------+-----------------+-------------+-----------------+-----------+--------------+--------------------------------------------------------------------------------------------------------+
+2 rows in set (0.19 sec)
+```
+
+### Benchmark
+
+We conducted a simple test on the TPCDS 1000 dataset in Paimon (0.8) version,
using Apache Doris 2.1.5 version and Trino 422 version, both with the Primary
Key Table Read Optimized feature enabled.
+
+
+
+From the test results, it can be seen that Doris's average query performance
on the standard static test set is 3-5 times that of Trino. In the future, we
will optimize the Deletion Vector to further improve query efficiency in real
business scenarios.
+
+## Query Optimization
+
+For baseline data, after introducing the Primary Key Table Read Optimized
feature in Apache Paimon version 0.6, the query engine can directly access the
underlying Parquet/ORC files, significantly improving the reading efficiency of
baseline data. For unmerged incremental data (data increments generated by
INSERT, UPDATE, or DELETE), they can be read through Merge-on-Read. In
addition, Paimon introduced the Deletion Vector feature in version 0.8, which
further enhances the query engine's [...]
+Apache Doris supports reading Deletion Vector through native Reader and
performing Merge on Read. We demonstrate the query methods for baseline data
and incremental data in a query using Doris's EXPLAIN statement.
+
+```
+mysql> explain verbose select * from customer where c_nationkey < 3;
++------------------------------------------------------------------------------------------------------------------------------------------------+
+| Explain String(Nereids Planner)
|
++------------------------------------------------------------------------------------------------------------------------------------------------+
+| ...............
|
+|
|
+| 0:VPAIMON_SCAN_NODE(68)
|
+| table: customer
|
+| predicates: (c_nationkey[#3] < 3)
|
+| inputSplitNum=4, totalFileSize=238324, scanRanges=4
|
+| partition=3/0
|
+| backends:
|
+| 10002
|
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=1/bucket-0/data-15cee5b7-1bd7-42ca-9314-56d92c62c03b-0.orc
start: 0 length: 66600 |
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=1/bucket-0/data-5d50255a-2215-4010-b976-d5dc656f3444-0.orc
start: 0 length: 44501 |
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=2/bucket-0/data-e98fb7ef-ec2b-4ad5-a496-713cb9481d56-0.orc
start: 0 length: 64059 |
+|
s3://warehouse/wh/db_paimon.db/customer/c_nationkey=0/bucket-0/data-431be05d-50fa-401f-9680-d646757d0f95-0.orc
start: 0 length: 63164 |
+| cardinality=18751, numNodes=1
|
+| pushdown agg=NONE
|
+| paimonNativeReadSplits=4/4
|
+| PaimonSplitStats:
|
+| SplitStat [type=NATIVE, rowCount=1542, rawFileConvertable=true,
hasDeletionVector=true] |
+| SplitStat [type=NATIVE, rowCount=750, rawFileConvertable=true,
hasDeletionVector=false] |
+| SplitStat [type=NATIVE, rowCount=750, rawFileConvertable=true,
hasDeletionVector=false] |
+| tuple ids: 0
+| ...............
|
|
++------------------------------------------------------------------------------------------------------------------------------------------------+
+67 rows in set (0.23 sec)
+```
+
+It can be seen that the table just updated by Flink SQL contains 4 shards, and
all shards can be accessed through Native Reader (paimonNativeReadSplits=4/4).
In addition, the hasDeletionVector property of the first shard is true,
indicating that the shard has a corresponding Deletion Vector, and data will be
filtered according to the Deletion Vector when reading.
+
diff --git a/versioned_sidebars/version-2.1-sidebars.json
b/versioned_sidebars/version-2.1-sidebars.json
index 2cda90a0d18..f9cbb22a673 100644
--- a/versioned_sidebars/version-2.1-sidebars.json
+++ b/versioned_sidebars/version-2.1-sidebars.json
@@ -10,7 +10,8 @@
"label": "Quick Start",
"items": [
"get-starting/quick-start/quick-start",
- "get-starting/quick-start/doris-hudi"
+ "get-starting/quick-start/doris-hudi",
+ "get-starting/quick-start/doris-paimon"
]
}
]
@@ -1538,4 +1539,4 @@
]
}
]
-}
\ No newline at end of file
+}
diff --git a/versioned_sidebars/version-3.0-sidebars.json
b/versioned_sidebars/version-3.0-sidebars.json
index 8c138ad1560..b93beb4cfdb 100644
--- a/versioned_sidebars/version-3.0-sidebars.json
+++ b/versioned_sidebars/version-3.0-sidebars.json
@@ -10,7 +10,8 @@
"label": "Quick Start",
"items": [
"get-starting/quick-start/quick-start",
- "get-starting/quick-start/doris-hudi"
+ "get-starting/quick-start/doris-hudi",
+ "get-starting/quick-start/doris-paimon"
]
}
]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]