This is an automated email from the ASF dual-hosted git repository.
eldenmoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new e3c8d318b72 Add Storage Format v3 (#3329)
e3c8d318b72 is described below
commit e3c8d318b729dd816f45ca6929cfc2ac49646e3a
Author: lihangyu <[email protected]>
AuthorDate: Tue Feb 3 16:57:30 2026 +0800
Add Storage Format v3 (#3329)
## Versions
- [x] dev
- [x] 4.x
- [ ] 3.x
- [ ] 2.1
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
docs/data-operate/import/complex-types/variant.md | 22 +++++++
.../sql-data-types/semi-structured/VARIANT.md | 1 +
docs/table-design/column-compression.md | 3 +
docs/table-design/storage-format.md | 69 ++++++++++++++++++++++
.../data-operate/import/complex-types/variant.md | 22 +++++++
.../sql-data-types/semi-structured/VARIANT.md | 1 +
.../current/table-design/column-compression.md | 3 +
.../current/table-design/storage-format.md | 68 +++++++++++++++++++++
.../data-operate/import/complex-types/variant.md | 22 +++++++
.../sql-data-types/semi-structured/VARIANT.md | 1 +
.../version-4.x/table-design/column-compression.md | 3 +
.../version-4.x/table-design/storage-format.md | 68 +++++++++++++++++++++
sidebars.ts | 1 +
store_format_v3.md | 44 ++++++++++++++
.../data-operate/import/complex-types/variant.md | 22 +++++++
.../sql-data-types/semi-structured/VARIANT.md | 1 +
.../version-4.x/table-design/column-compression.md | 3 +
.../version-4.x/table-design/storage-format.md | 69 ++++++++++++++++++++++
versioned_sidebars/version-4.x-sidebars.json | 1 +
19 files changed, 424 insertions(+)
diff --git a/docs/data-operate/import/complex-types/variant.md
b/docs/data-operate/import/complex-types/variant.md
index 894e7c05083..974c2231b85 100644
--- a/docs/data-operate/import/complex-types/variant.md
+++ b/docs/data-operate/import/complex-types/variant.md
@@ -12,6 +12,28 @@ The VARIANT type can store semi-structured JSON data,
allowing for the storage o
Supports CSV and JSON formats.
+## Storage Optimization (V3)
+
+For wide tables with a large number of dynamic sub-columns (e.g., more than
2000 columns) generated by the `VARIANT` type, it is highly recommended to
enable **Storage Format V3**.
+
+### Advantages of V3 for Variant
+- **Metadata Decoupling**: V3 moves column metadata (`ColumnMetaPB`) out of
the Segment Footer into a separate area. This prevents the Footer from becoming
too large when there are thousands of dynamic columns, significantly speeding
up file opening and reducing memory overhead.
+- **On-demand Loading**: Metadata is loaded only when needed, which is
particularly beneficial for cloud-native storage where object storage access
latency is higher.
+- **Compact Storage**: Uses `BINARY_PLAIN_ENCODING_V2` to eliminate large
trailing offset tables, making storage more compact for string and JSONB types
within `VARIANT`.
+- **Improved Numerical Scan Performance**: Integer types default to
`PLAIN_ENCODING`, which provides higher read throughput when combined with
LZ4/ZSTD.
+
+To enable V3, specify the storage format in the table properties:
+```sql
+CREATE TABLE table_v3 (
+ id BIGINT,
+ data VARIANT
+)
+DISTRIBUTED BY HASH(id) BUCKETS 32
+PROPERTIES (
+ "storage_format" = "V3"
+);
+```
+
## Loading CSV Format
### Step 1: Prepare Data
diff --git
a/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
b/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index 09c0cde98f8..a406c55b6f0 100644
--- a/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++ b/docs/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -442,6 +442,7 @@ See the “Configuration” section below for the full property
list.
## Limitations
+- **Wide tables optimization**: For wide tables with a large number of dynamic
sub-columns (e.g., more than 2000 columns) generated by the `VARIANT` type, it
is highly recommended to enable **Storage Format V3** by specifying
`"storage_format" = "V3"` in the table `PROPERTIES`. This decouples column
metadata from the Segment Footer, speeding up file opening and reducing memory
overhead.
- JSON key length ≤ 255.
- Cannot be a primary key or sort key.
- Cannot be nested within other types (e.g., `Array<Variant>`,
`Struct<Variant>`).
diff --git a/docs/table-design/column-compression.md
b/docs/table-design/column-compression.md
index b6504be902b..60cfb0950b2 100644
--- a/docs/table-design/column-compression.md
+++ b/docs/table-design/column-compression.md
@@ -41,6 +41,9 @@ Doris supports various compression algorithms, each with
different trade-offs be
**Encoding Before Compression**
Before compressing data, Doris encodes the column data (e.g., **dictionary
encoding**, **run-length encoding**, etc.) to transform the data into a form
more suitable for compression, further enhancing compression efficiency.
+**Storage Format V3 Optimizations**
+ Starting from Doris Storage Format V3, the encoding strategy for numerical
types has been further optimized. It defaults to `PLAIN_ENCODING` for integer
types, which, when combined with LZ4/ZSTD, provides higher read throughput and
lower CPU overhead. For more details, see [Storage Format V3](./storage-format).
+
**Page Compression**
Doris adopts a **page**-level compression strategy. The data in each column
is divided into multiple pages, and the data within each page is compressed
independently. By compressing by page, Doris can efficiently handle large-scale
datasets while ensuring high compression ratios and decompression performance.
diff --git a/docs/table-design/storage-format.md
b/docs/table-design/storage-format.md
new file mode 100644
index 00000000000..ed45f472049
--- /dev/null
+++ b/docs/table-design/storage-format.md
@@ -0,0 +1,69 @@
+---
+{
+ "title": "Storage Format V3",
+ "language": "en"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+Doris Storage Format V3 is a major evolution from the Segment V2 format.
Through metadata decoupling and encoding strategy optimization, it specifically
improves performance for wide tables, complex data types (such as Variant), and
cloud-native storage-compute separation scenarios.
+
+## Key Optimizations
+
+### External Column Meta
+* **Background**: In Segment V2, metadata for all columns (`ColumnMetaPB`)
is stored in the Footer of the Segment file. For wide tables with thousands of
columns or auto-scaling Variant scenarios, the Footer can grow to several
megabytes.
+* **Optimization**: V3 decouples `ColumnMetaPB` from the Footer and stores
it in a separate area within the file (External Column Meta Area).
+* **Benefits**:
+ * **Ultra-fast Metadata Loading**: Significantly reduces Segment Footer
size, speeding up initial file opening.
+ * **On-demand Loading**: Metadata can be loaded on demand from the
independent area, reducing memory usage and improving cold start query
performance on object storage (like S3/OSS).
+
+### Integer Type Plain Encoding
+* **Optimization**: V3 defaults to `PLAIN_ENCODING` (raw binary storage) for
numerical types (such as `INT`, `BIGINT`), instead of the traditional
BitShuffle.
+* **Benefits**: Combined with LZ4/ZSTD compression, `PLAIN_ENCODING`
provides higher read throughput and lower CPU overhead. In modern high-speed IO
environments, this "trading decompression for performance" strategy offers a
clear advantage when scanning large volumes of data.
+
+### Binary Plain Encoding V2
+* **Optimization**: Introduces `BINARY_PLAIN_ENCODING_V2`, using a
`[length(varuint)][raw_data]` streaming layout, replacing the old format that
relied on trailing offset tables.
+* **Benefits**: Eliminates large trailing offset tables, making data storage
more compact and significantly reducing storage consumption for string and
JSONB types.
+
+## Design Philosophy
+The design philosophy of V3 can be summarized as: **"Metadata Decoupling,
Encoding Simplification, and Streaming Layout"**. By reducing metadata
processing bottlenecks and leveraging the high efficiency of modern CPUs in
processing simple encodings, it achieves high-performance analysis under
complex schemas.
+
+## Use Cases
+- **Wide Tables**: Tables with more than 2000 columns or long column names.
+- **Semi-structured Data**: Heavy use of `VARIANT` or `JSON` types.
+- **Tiered Storage/Cloud Native**: Scenarios sensitive to object storage
loading latency.
+- **High-performance Scanning**: Analytical tasks with extreme requirements
for scan throughput.
+
+## Usage
+
+### Enable When Creating a New Table
+Specify `storage_format` as `V3` in the `PROPERTIES` of the `CREATE TABLE`
statement:
+
+```sql
+CREATE TABLE table_v3 (
+ id BIGINT,
+ data VARIANT
+)
+DISTRIBUTED BY HASH(id) BUCKETS 32
+PROPERTIES (
+ "storage_format" = "V3"
+);
+```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/variant.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/variant.md
index cff7d906f27..39271b61a9a 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/variant.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/complex-types/variant.md
@@ -12,6 +12,28 @@ VARIANT 类型可以存储半结构化的 JSON 数据,允许存储包含不同
支持 CSV 和 JSON 格式。
+## 存储优化 (V3)
+
+对于使用 `VARIANT` 类型可能产生大量动态子列(例如超过 2000 列)的大宽表场景,强烈建议开启 **V3 存储格式**。
+
+### V3 对 Variant 的优势
+- **元数据解耦**:V3 将列元数据 (`ColumnMetaPB`) 从 Segment Footer
中剥离,提升了拥有成千上万个动态列时的文件初次打开速度,并显著降低内存占用。
+- **按需加载**:元数据可以按需加载,这在存算分离场景下能显著降低访问对象存储的延迟。
+- **更紧凑的存储**:通过 `BINARY_PLAIN_ENCODING_V2` 消除了庞大的末尾偏移表,降低了 `VARIANT` 中字符串和
JSONB 类型的存储空间。
+- **更高的数值扫描速度**:整数类型默认采用 `PLAIN_ENCODING`,配合 LZ4/ZSTD 时能提供更高的读取吞吐。
+
+在建表时可以通过以下属性开启 V3:
+```sql
+CREATE TABLE table_v3 (
+ id BIGINT,
+ data VARIANT
+)
+DISTRIBUTED BY HASH(id) BUCKETS 32
+PROPERTIES (
+ "storage_format" = "V3"
+);
+```
+
## CSV 格式导入
### 第 1 步:准备数据
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index a513d431898..c7f202a9aab 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -440,6 +440,7 @@ SELECT * FROM tbl WHERE v['str'] MATCH 'Doris';
## 限制
+- **大宽表优化**:针对 `VARIANT` 类型可能产生大量动态子列(例如超过 2000 列)的大宽表场景,强烈建议开启 **V3
存储格式**。通过在建表 `PROPERTIES` 中指定 `"storage_format" = "V3"`,可以将列元数据与 Segment Footer
解耦,加快文件打开速度并降低内存占用。
- JSON key 长度 ≤ 255。
- 不支持作为主键或排序键。
- 不支持与其他类型嵌套(如 `Array<Variant>`、`Struct<Variant>`)。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/column-compression.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/column-compression.md
index 5a26362fe54..a920eb27be6 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/column-compression.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/column-compression.md
@@ -41,6 +41,9 @@ Doris 支持多种压缩算法,每种算法在压缩率和解压速度之间
**压缩前的编码**
在压缩数据之前,Doris 会对列数据进行编码(例如**字典编码**、**游程编码**等),将数据转换为更适合压缩的形式,从而进一步提升压缩效率。
+**存储格式 V3 优化**
+ 从 Doris 存储格式 V3 开始,数值类型的编码策略得到了进一步优化。它针对整数类型默认采用 `PLAIN_ENCODING`,配合
LZ4/ZSTD 压缩时,能够提供更高的读取吞吐量和更低的 CPU 开销。详情请参考 [存储格式 V3](./storage-format)。
+
**按页压缩**
Doris 采用 **页(Page)** 级别的压缩策略。每一列的数据会被分成多个页,每个页内的数据会独立进行压缩。通过按页压缩,Doris
能够高效地处理大规模数据集,同时保证高效的压缩率和解压性能。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/storage-format.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/storage-format.md
new file mode 100644
index 00000000000..6e393f939a4
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/storage-format.md
@@ -0,0 +1,68 @@
+---
+{
+ "title": "存储格式 V3",
+ "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+Apache Doris 存储格式 V3 是在 Segment V2
格式基础上进行的重大演进。它通过元数据解耦与编码策略优化,专门针对大宽表、复杂数据类型(如 Variant)以及云原生存算分离场景提升性能。
+
+## 核心优化点
+
+### 外部列元数据 (External Column Meta)
+* **优化背景**:在 Segment V2 中,所有列的元数据(`ColumnMetaPB`)都存储在 Segment 文件的 Footer
中。对于拥有数千列的大宽表或自动扩容的 Variant 场景,Footer 可能会膨胀到几 MB。
+* **优化思路**:V3 将 `ColumnMetaPB` 从 Footer 中剥离,转而存储在文件内的独立区域(External Column
Meta Area)。
+* **收益**:
+ * **极速元数据加载**:显著减小 Segment Footer 体积,加快文件初次打开速度。
+ * **按需加载**:元数据可以按需从独立区域加载,降低内存占用,提升对象存储(如 S3/OSS)上的冷启动查询性能。
+
+### 数值类型 Plain 编码模式 (Integer Type Plain Encoding)
+* **优化思路**:V3 默认将数值类型(如 `INT`, `BIGINT`)切换为 `PLAIN_ENCODING`(原始二进制存储),而非传统的
BitShuffle。
+* **收益**:配合 LZ4/ZSTD 压缩时,`PLAIN_ENCODING` 提供了更高的读取吞吐量和更低的 CPU 开销。在现代高速 IO
环境下,这种“解压换性能”的策略在扫描大体量数据时优势明显。
+
+### 二进制 Plain 编码 V2 (Binary Plain Encoding V2)
+* **优化思路**:引入 `BINARY_PLAIN_ENCODING_V2`,采用 `[长度(varuint)][原始数据]`
的流式布局,取代了依赖末尾偏移表(Offsets)的旧格式。
+* **收益**:消除了末尾庞大的偏移表,数据存储更加紧凑,有效降低了字符串和 JSONB 类型的存储空间占用。
+
+## 设计哲学
+V3 的设计哲学可以总结为:**“元数据解耦、编码简化、流式布局”**。通过减少元数据处理瓶颈和利用现代 CPU
对简单编码的高处理效率,实现在复杂模式下的高性能分析。
+
+## 使用场景
+- **大宽表**:字段数量超过 2000 个以上,或字段名冗长。
+- **半结构化数据**:大量使用 `VARIANT`, 且物化列数超过2000列。
+- **冷热分离/云原生**:对对象存储加载延迟敏感的场景。
+- **高性能扫描**:对 Scan 吞吐量有极致要求的分析任务。
+
+## 使用方式
+
+### 创建新表时启用
+在建表语句的 `PROPERTIES` 中指定 `storage_format` 为 `V3`:
+```sql
+CREATE TABLE table_v3 (
+ id BIGINT,
+ data VARIANT
+)
+DISTRIBUTED BY HASH(id) BUCKETS 32
+PROPERTIES (
+ "storage_format" = "V3"
+);
+```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/complex-types/variant.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/complex-types/variant.md
index cff7d906f27..39271b61a9a 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/complex-types/variant.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/data-operate/import/complex-types/variant.md
@@ -12,6 +12,28 @@ VARIANT 类型可以存储半结构化的 JSON 数据,允许存储包含不同
支持 CSV 和 JSON 格式。
+## 存储优化 (V3)
+
+对于使用 `VARIANT` 类型可能产生大量动态子列(例如超过 2000 列)的大宽表场景,强烈建议开启 **V3 存储格式**。
+
+### V3 对 Variant 的优势
+- **元数据解耦**:V3 将列元数据 (`ColumnMetaPB`) 从 Segment Footer
中剥离,提升了拥有成千上万个动态列时的文件初次打开速度,并显著降低内存占用。
+- **按需加载**:元数据可以按需加载,这在存算分离场景下能显著降低访问对象存储的延迟。
+- **更紧凑的存储**:通过 `BINARY_PLAIN_ENCODING_V2` 消除了庞大的末尾偏移表,降低了 `VARIANT` 中字符串和
JSONB 类型的存储空间。
+- **更高的数值扫描速度**:整数类型默认采用 `PLAIN_ENCODING`,配合 LZ4/ZSTD 时能提供更高的读取吞吐。
+
+在建表时可以通过以下属性开启 V3:
+```sql
+CREATE TABLE table_v3 (
+ id BIGINT,
+ data VARIANT
+)
+DISTRIBUTED BY HASH(id) BUCKETS 32
+PROPERTIES (
+ "storage_format" = "V3"
+);
+```
+
## CSV 格式导入
### 第 1 步:准备数据
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index ffbc758a4e2..df0b20822b9 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -390,6 +390,7 @@ SELECT * FROM tbl WHERE v['str'] MATCH 'Doris';
## 限制
+- **大宽表优化**:针对 `VARIANT` 类型可能产生大量动态子列(例如超过 2000 列)的大宽表场景,强烈建议开启 **V3
存储格式**。通过在建表 `PROPERTIES` 中指定 `"storage_format" = "V3"`,可以将列元数据与 Segment Footer
解耦,加快文件打开速度并降低内存占用。
- `variant_max_subcolumns_count`:默认 0(不限制 Path 物化列数)。建议在生产设置为 2048(Tablet
级别)以控制列数。超过阈值后,低频/稀疏路径会被收敛到共享数据结构,从该结构查询可能带来性能下降(详见“配置”)。
- 若 Schema Template 指定了 Path 类型,则该 Path 会被强制提取;当
`variant_enable_typed_paths_to_sparse = true` 时,它也会计入阈值,可能被收敛到共享结构。
- JSON key 长度 ≤ 255。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/column-compression.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/column-compression.md
index 5a26362fe54..a920eb27be6 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/column-compression.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/column-compression.md
@@ -41,6 +41,9 @@ Doris 支持多种压缩算法,每种算法在压缩率和解压速度之间
**压缩前的编码**
在压缩数据之前,Doris 会对列数据进行编码(例如**字典编码**、**游程编码**等),将数据转换为更适合压缩的形式,从而进一步提升压缩效率。
+**存储格式 V3 优化**
+ 从 Doris 存储格式 V3 开始,数值类型的编码策略得到了进一步优化。它针对整数类型默认采用 `PLAIN_ENCODING`,配合
LZ4/ZSTD 压缩时,能够提供更高的读取吞吐量和更低的 CPU 开销。详情请参考 [存储格式 V3](./storage-format)。
+
**按页压缩**
Doris 采用 **页(Page)** 级别的压缩策略。每一列的数据会被分成多个页,每个页内的数据会独立进行压缩。通过按页压缩,Doris
能够高效地处理大规模数据集,同时保证高效的压缩率和解压性能。
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/storage-format.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/storage-format.md
new file mode 100644
index 00000000000..6e393f939a4
--- /dev/null
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/table-design/storage-format.md
@@ -0,0 +1,68 @@
+---
+{
+ "title": "存储格式 V3",
+ "language": "zh-CN"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+Apache Doris 存储格式 V3 是在 Segment V2
格式基础上进行的重大演进。它通过元数据解耦与编码策略优化,专门针对大宽表、复杂数据类型(如 Variant)以及云原生存算分离场景提升性能。
+
+## 核心优化点
+
+### 外部列元数据 (External Column Meta)
+* **优化背景**:在 Segment V2 中,所有列的元数据(`ColumnMetaPB`)都存储在 Segment 文件的 Footer
中。对于拥有数千列的大宽表或自动扩容的 Variant 场景,Footer 可能会膨胀到几 MB。
+* **优化思路**:V3 将 `ColumnMetaPB` 从 Footer 中剥离,转而存储在文件内的独立区域(External Column
Meta Area)。
+* **收益**:
+ * **极速元数据加载**:显著减小 Segment Footer 体积,加快文件初次打开速度。
+ * **按需加载**:元数据可以按需从独立区域加载,降低内存占用,提升对象存储(如 S3/OSS)上的冷启动查询性能。
+
+### 数值类型 Plain 编码模式 (Integer Type Plain Encoding)
+* **优化思路**:V3 默认将数值类型(如 `INT`, `BIGINT`)切换为 `PLAIN_ENCODING`(原始二进制存储),而非传统的
BitShuffle。
+* **收益**:配合 LZ4/ZSTD 压缩时,`PLAIN_ENCODING` 提供了更高的读取吞吐量和更低的 CPU 开销。在现代高速 IO
环境下,这种“解压换性能”的策略在扫描大体量数据时优势明显。
+
+### 二进制 Plain 编码 V2 (Binary Plain Encoding V2)
+* **优化思路**:引入 `BINARY_PLAIN_ENCODING_V2`,采用 `[长度(varuint)][原始数据]`
的流式布局,取代了依赖末尾偏移表(Offsets)的旧格式。
+* **收益**:消除了末尾庞大的偏移表,数据存储更加紧凑,有效降低了字符串和 JSONB 类型的存储空间占用。
+
+## 设计哲学
+V3 的设计哲学可以总结为:**“元数据解耦、编码简化、流式布局”**。通过减少元数据处理瓶颈和利用现代 CPU
对简单编码的高处理效率,实现在复杂模式下的高性能分析。
+
+## 使用场景
+- **大宽表**:字段数量超过 2000 个以上,或字段名冗长。
+- **半结构化数据**:大量使用 `VARIANT`, 且物化列数超过2000列。
+- **冷热分离/云原生**:对对象存储加载延迟敏感的场景。
+- **高性能扫描**:对 Scan 吞吐量有极致要求的分析任务。
+
+## 使用方式
+
+### 创建新表时启用
+在建表语句的 `PROPERTIES` 中指定 `storage_format` 为 `V3`:
+```sql
+CREATE TABLE table_v3 (
+ id BIGINT,
+ data VARIANT
+)
+DISTRIBUTED BY HASH(id) BUCKETS 32
+PROPERTIES (
+ "storage_format" = "V3"
+);
+```
diff --git a/sidebars.ts b/sidebars.ts
index 45bc9a1229a..15e7ed3e1ce 100644
--- a/sidebars.ts
+++ b/sidebars.ts
@@ -117,6 +117,7 @@ const sidebars: SidebarsConfig = {
},
'table-design/data-type',
'table-design/column-compression',
+ 'table-design/storage-format',
{
type: 'category',
label: 'Table Indexes',
diff --git a/store_format_v3.md b/store_format_v3.md
new file mode 100644
index 00000000000..8a9c3022e5a
--- /dev/null
+++ b/store_format_v3.md
@@ -0,0 +1,44 @@
+# Apache Doris 存储格式 V3 (Storage Format V3)
+
+Doris 存储格式 V3 是在 Segment V2 格式基础上进行的重大演进。它通过元数据解耦与编码策略优化,专门针对大宽表、复杂数据类型(如
Variant)以及云原生存算分离场景提升性能。
+
+## 1. 核心核心优化点
+
+### 1.1 外部列元数据 (External Column Meta)
+* **优化背景**:在 Segment V2 中,所有列的元数据(`ColumnMetaPB`)都存储在 Segment 文件的 Footer
中。对于拥有数千列的大宽表或自动扩容的 Variant 场景,Footer 可能会膨胀到几 MB。
+* **优化思路**:V3 将 `ColumnMetaPB` 从 Footer 中剥离,转而存储在文件内的独立区域(External Column
Meta Area)。
+* **收益**:
+ * **极速元数据加载**:显著减小 Segment Footer 体积,加快文件初次打开速度。
+ * **按需加载**:元数据可以按需从独立区域加载,降低内存占用,提升对象存储(如 S3/OSS)上的冷启动查询性能。
+
+### 1.2 数值类型 Plain 编码模式 (Integer Type Plain Encoding)
+* **优化思路**:V3 默认将数值类型(如 `INT`, `BIGINT`)切换为 `PLAIN_ENCODING`(原始二进制存储),而非传统的
BitShuffle。
+* **收益**:配合 LZ4/ZSTD 压缩时,`PLAIN_ENCODING` 提供了更高的读取吞吐量和更低的 CPU 开销。在现代高速 IO
环境下,这种“解压换性能”的策略在扫描大体量数据时优势明显。
+
+### 1.3 二进制 Plain 编码 V2 (Binary Plain Encoding V2)
+* **优化思路**:引入 `BINARY_PLAIN_ENCODING_V2`,采用 `[长度(varuint)][原始数据]`
的流式布局,取代了依赖末尾偏移表(Offsets)的旧格式。
+* **收益**:消除了末尾庞大的偏移表,数据存储更加紧凑,且更利于指令流水线进行顺序扫描(Vectorized Sequential
Scan),提升了字符串和 JSONB 类型的扫描效率。
+
+## 2. 设计哲学
+V3 的设计哲学可以总结为:**“元数据解耦、编码简化、流式布局”**。通过减少元数据处理瓶颈和利用现代 CPU
对简单编码的高处理效率,实现在复杂模式下的高性能分析。
+
+## 3. 使用场景
+- **大宽表**:字段数量超过 2000 个以上,或字段名冗长。
+- **半结构化数据**:大量使用 `VARIANT`, 且物化列数超过2000列。
+- **冷热分离/云原生**:对对象存储加载延迟敏感的场景。
+- **高性能扫描**:对 Scan 吞吐量有极致要求的分析任务。
+
+## 4. 使用方式
+
+### 创建新表时启用
+在建表语句的 `PROPERTIES` 中指定 `storage_format` 为 `V3`:
+```sql
+CREATE TABLE table_v3 (
+ id BIGINT,
+ data VARIANT
+)
+DISTRIBUTED BY HASH(id) BUCKETS 32
+PROPERTIES (
+ "storage_format" = "V3"
+);
+```
\ No newline at end of file
diff --git
a/versioned_docs/version-4.x/data-operate/import/complex-types/variant.md
b/versioned_docs/version-4.x/data-operate/import/complex-types/variant.md
index 894e7c05083..974c2231b85 100644
--- a/versioned_docs/version-4.x/data-operate/import/complex-types/variant.md
+++ b/versioned_docs/version-4.x/data-operate/import/complex-types/variant.md
@@ -12,6 +12,28 @@ The VARIANT type can store semi-structured JSON data,
allowing for the storage o
Supports CSV and JSON formats.
+## Storage Optimization (V3)
+
+For wide tables with a large number of dynamic sub-columns (e.g., more than
2000 columns) generated by the `VARIANT` type, it is highly recommended to
enable **Storage Format V3**.
+
+### Advantages of V3 for Variant
+- **Metadata Decoupling**: V3 moves column metadata (`ColumnMetaPB`) out of
the Segment Footer into a separate area. This prevents the Footer from becoming
too large when there are thousands of dynamic columns, significantly speeding
up file opening and reducing memory overhead.
+- **On-demand Loading**: Metadata is loaded only when needed, which is
particularly beneficial for cloud-native storage where object storage access
latency is higher.
+- **Compact Storage**: Uses `BINARY_PLAIN_ENCODING_V2` to eliminate large
trailing offset tables, making storage more compact for string and JSONB types
within `VARIANT`.
+- **Improved Numerical Scan Performance**: Integer types default to
`PLAIN_ENCODING`, which provides higher read throughput when combined with
LZ4/ZSTD.
+
+To enable V3, specify the storage format in the table properties:
+```sql
+CREATE TABLE table_v3 (
+ id BIGINT,
+ data VARIANT
+)
+DISTRIBUTED BY HASH(id) BUCKETS 32
+PROPERTIES (
+ "storage_format" = "V3"
+);
+```
+
## Loading CSV Format
### Step 1: Prepare Data
diff --git
a/versioned_docs/version-4.x/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
b/versioned_docs/version-4.x/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
index c6c200efd85..7ef90414648 100644
---
a/versioned_docs/version-4.x/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
+++
b/versioned_docs/version-4.x/sql-manual/basic-element/sql-data-types/semi-structured/VARIANT.md
@@ -390,6 +390,7 @@ SELECT * FROM tbl WHERE v['str'] MATCH 'Doris';
## Limitations
+- **Wide tables optimization**: For wide tables with a large number of dynamic
sub-columns (e.g., more than 2000 columns) generated by the `VARIANT` type, it
is highly recommended to enable **Storage Format V3** by specifying
`"storage_format" = "V3"` in the table `PROPERTIES`. This decouples column
metadata from the Segment Footer, speeding up file opening and reducing memory
overhead.
- `variant_max_subcolumns_count`: default 0 (no limit). In production, set to
2048 (tablet level) to control the number of materialized paths. Above the
threshold, low-frequency/sparse paths are moved to a shared data structure;
reading from it may be slower (see “Configuration”).
- If a path type is specified via Schema Template, that path will be forced to
be materialized; when `variant_enable_typed_paths_to_sparse = true`, it also
counts toward the threshold and may be moved to the shared structure.
- JSON key length ≤ 255.
diff --git a/versioned_docs/version-4.x/table-design/column-compression.md
b/versioned_docs/version-4.x/table-design/column-compression.md
index b6504be902b..60cfb0950b2 100644
--- a/versioned_docs/version-4.x/table-design/column-compression.md
+++ b/versioned_docs/version-4.x/table-design/column-compression.md
@@ -41,6 +41,9 @@ Doris supports various compression algorithms, each with
different trade-offs be
**Encoding Before Compression**
Before compressing data, Doris encodes the column data (e.g., **dictionary
encoding**, **run-length encoding**, etc.) to transform the data into a form
more suitable for compression, further enhancing compression efficiency.
+**Storage Format V3 Optimizations**
+ Starting from Doris Storage Format V3, the encoding strategy for numerical
types has been further optimized. It defaults to `PLAIN_ENCODING` for integer
types, which, when combined with LZ4/ZSTD, provides higher read throughput and
lower CPU overhead. For more details, see [Storage Format V3](./storage-format).
+
**Page Compression**
Doris adopts a **page**-level compression strategy. The data in each column
is divided into multiple pages, and the data within each page is compressed
independently. By compressing by page, Doris can efficiently handle large-scale
datasets while ensuring high compression ratios and decompression performance.
diff --git a/versioned_docs/version-4.x/table-design/storage-format.md
b/versioned_docs/version-4.x/table-design/storage-format.md
new file mode 100644
index 00000000000..ed45f472049
--- /dev/null
+++ b/versioned_docs/version-4.x/table-design/storage-format.md
@@ -0,0 +1,69 @@
+---
+{
+ "title": "Storage Format V3",
+ "language": "en"
+}
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+Doris Storage Format V3 is a major evolution from the Segment V2 format.
Through metadata decoupling and encoding strategy optimization, it specifically
improves performance for wide tables, complex data types (such as Variant), and
cloud-native storage-compute separation scenarios.
+
+## Key Optimizations
+
+### External Column Meta
+* **Background**: In Segment V2, metadata for all columns (`ColumnMetaPB`)
is stored in the Footer of the Segment file. For wide tables with thousands of
columns or auto-scaling Variant scenarios, the Footer can grow to several
megabytes.
+* **Optimization**: V3 decouples `ColumnMetaPB` from the Footer and stores
it in a separate area within the file (External Column Meta Area).
+* **Benefits**:
+ * **Ultra-fast Metadata Loading**: Significantly reduces Segment Footer
size, speeding up initial file opening.
+ * **On-demand Loading**: Metadata can be loaded on demand from the
independent area, reducing memory usage and improving cold start query
performance on object storage (like S3/OSS).
+
+### Integer Type Plain Encoding
+* **Optimization**: V3 defaults to `PLAIN_ENCODING` (raw binary storage) for
numerical types (such as `INT`, `BIGINT`), instead of the traditional
BitShuffle.
+* **Benefits**: Combined with LZ4/ZSTD compression, `PLAIN_ENCODING`
provides higher read throughput and lower CPU overhead. In modern high-speed IO
environments, this "trading decompression for performance" strategy offers a
clear advantage when scanning large volumes of data.
+
+### Binary Plain Encoding V2
+* **Optimization**: Introduces `BINARY_PLAIN_ENCODING_V2`, using a
`[length(varuint)][raw_data]` streaming layout, replacing the old format that
relied on trailing offset tables.
+* **Benefits**: Eliminates large trailing offset tables, making data storage
more compact and significantly reducing storage consumption for string and
JSONB types.
+
+## Design Philosophy
+The design philosophy of V3 can be summarized as: **"Metadata Decoupling,
Encoding Simplification, and Streaming Layout"**. By reducing metadata
processing bottlenecks and leveraging the high efficiency of modern CPUs in
processing simple encodings, it achieves high-performance analysis under
complex schemas.
+
+## Use Cases
+- **Wide Tables**: Tables with more than 2000 columns or long column names.
+- **Semi-structured Data**: Heavy use of `VARIANT` or `JSON` types.
+- **Tiered Storage/Cloud Native**: Scenarios sensitive to object storage
loading latency.
+- **High-performance Scanning**: Analytical tasks with extreme requirements
for scan throughput.
+
+## Usage
+
+### Enable When Creating a New Table
+Specify `storage_format` as `V3` in the `PROPERTIES` of the `CREATE TABLE`
statement:
+
+```sql
+CREATE TABLE table_v3 (
+ id BIGINT,
+ data VARIANT
+)
+DISTRIBUTED BY HASH(id) BUCKETS 32
+PROPERTIES (
+ "storage_format" = "V3"
+);
+```
diff --git a/versioned_sidebars/version-4.x-sidebars.json
b/versioned_sidebars/version-4.x-sidebars.json
index 33f6c724640..58565161378 100644
--- a/versioned_sidebars/version-4.x-sidebars.json
+++ b/versioned_sidebars/version-4.x-sidebars.json
@@ -120,6 +120,7 @@
},
"table-design/data-type",
"table-design/column-compression",
+ "table-design/storage-format",
{
"type": "category",
"label": "Table Indexes",
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]