(doris-website) branch master updated: [doc](iceberg) support schema change ddl for iceberg (#2596)

morningman Fri, 11 Jul 2025 17:07:57 -0700

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 58cb20982a1 [doc](iceberg) support schema change ddl for iceberg 
(#2596)
58cb20982a1 is described below

commit 58cb20982a1dedeba1c68227e19f1f09a28785a9
Author: Socrates <[email protected]>
AuthorDate: Sat Jul 12 08:07:03 2025 +0800

    [doc](iceberg) support schema change ddl for iceberg (#2596)
    
    pr: https://github.com/apache/doris/pull/52591
    ## Versions
    
    - [x] dev
    - [ ] 3.0
    - [ ] 2.1
    - [ ] 2.0
    
    ## Languages
    
    - [x] Chinese
    - [x] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
    
    ---------
    
    Co-authored-by: Mingyu Chen (Rayner) <[email protected]>
---
 docs/lakehouse/catalogs/iceberg-catalog.md         | 176 ++++++++++++++++++++-
 .../current/lakehouse/catalogs/iceberg-catalog.md  | 174 +++++++++++++++++++-
 2 files changed, 338 insertions(+), 12 deletions(-)

diff --git a/docs/lakehouse/catalogs/iceberg-catalog.md 
b/docs/lakehouse/catalogs/iceberg-catalog.md
index 526370fc130..a61210d4c18 100644
--- a/docs/lakehouse/catalogs/iceberg-catalog.md
+++ b/docs/lakehouse/catalogs/iceberg-catalog.md
@@ -747,11 +747,175 @@ For an Iceberg Database, you must first drop all tables 
under the database befor
 
   * Object storage
 
-## Appendix
+### Schema Change
 
-### Change Log
+Start from 3.1.0, Doris supports schema changes for Iceberg tables, which can 
be modified using the `ALTER TABLE` statement.
 
-| Doris Version | Feature Support                        |
-| -------------- | -------------------------------------- |
-| 2.1.3          | Support for ORC file format, Equality Delete |
-| 2.1.6          | Support for DDL, DML                   |
+Supported schema change operations include:
+
+* **Rename Column**
+
+Use the `RENAME COLUMN` clause to rename columns. Renaming columns within 
nested types is not supported.
+
+```sql
+ALTER TABLE iceberg_table RENAME COLUMN old_col_name TO new_col_name;
+```
+
+* **Add a Column**
+
+Use `ADD COLUMN` to add a new column. The new column will be added to the end 
of the table. Adding new columns to nested types is not supported.
+
+When adding a new column, you can specify nullable attributes, default values, 
and comments.
+
+```sql
+ALTER TABLE iceberg_table ADD COLUMN col_name col_type [nullable, [default 
default_value, [comment 'comment']]];
+```
+
+Example:
+
+```sql
+ALTER TABLE iceberg_table ADD COLUMN new_col STRING NOT NULL DEFAULT 
'default_value' COMMENT 'This is a new col';
+```
+
+* **Add Columns**
+
+You can also use `ADD COLUMN` to add multiple columns. The new columns will be 
added to the end of the table. Adding new columns to nested types is not 
supported.
+
+The syntax for each column is the same as adding a single column.
+
+```sql
+ALTER TABLE iceberg_table ADD COLUMN (col_name1 col_type1 [nullable, [default 
default_value, [comment 'comment']]], col_name2 col_type2 [nullable, [default 
default_value, [comment 'comment']]] ...);
+```
+
+* **Drop Column**
+
+Use `DROP COLUMN` to drop columns. Dropping columns within nested types is not 
supported.
+
+```sql
+ALTER TABLE iceberg_table DROP COLUMN col_name;
+```
+
+* **Modify Column**
+
+Use the `MODIFY COLUMN` statement to modify column attributes, including type, 
nullable, default value, and comment.
+
+Note: When modifying column attributes, all attributes that are not being 
modified should also be explicitly specified with their original values.
+
+```sql
+ALTER TABLE iceberg_table MODIFY COLUMN col_name col_type [nullable, [default 
default_value, [comment 'comment']]];
+```
+
+Example:
+
+```sql
+CREATE TABLE iceberg_table (
+    id INT,
+    name STRING
+);
+-- Modify the id column type to BIGINT, set as NOT NULL, default value to 0, 
and add comment
+ALTER TABLE iceberg_table MODIFY COLUMN id BIGINT NOT NULL DEFAULT 0 COMMENT 
'This is a modified id column';
+```
+
+* **Reorder Columns**
+
+Use `ORDER BY` to reorder columns by specifying the new column order.
+
+```sql
+ALTER TABLE iceberg_table ORDER BY (col_name1, col_name2, ...);
+```
+
+## Iceberg Table Optimization
+
+### View Data File Distribution
+
+You can use the following SQL to analyze the data distribution and delete file 
count of Iceberg tables, helping determine whether optimization like Compaction 
is needed.
+
+- Display data file size distribution. This can help identify if there are too 
many
+
+  ```sql
+  SELECT
+    CASE
+      WHEN file_size_in_bytes BETWEEN 0 AND 8 * 1024 * 1024 THEN '0-8M'
+      WHEN file_size_in_bytes BETWEEN 8 * 1024 * 1024 + 1 AND 32 * 1024 * 1024 
THEN '8-32M'
+      WHEN file_size_in_bytes BETWEEN 2 * 1024 * 1024 + 1 AND 128 * 1024 * 
1024 THEN '32-128M'
+      WHEN file_size_in_bytes BETWEEN 128 * 1024 * 1024 + 1 AND 512 * 1024 * 
1024 THEN '128-512M'
+      WHEN file_size_in_bytes > 512 * 1024 * 1024 THEN '> 512M'
+      ELSE 'Unknown'
+    END AS SizeRange,
+    COUNT(*) AS FileNum
+  FROM store_sales$data_files
+  GROUP BY
+    SizeRange;
+
+  +-----------+---------+
+  | SizeRange | FileNum |
+  +-----------+---------+
+  | 0-8M      |       8 |
+  | 8-32M     |       6 |
+  +-----------+---------+
+  ```
+
+- Display the number of data files and delete files
+
+  ```sql
+  SELECT
+    CASE
+      WHEN content = 0 THEN 'DataFile'
+      WHEN content = 1 THEN 'PositionDeleteFile'
+      WHEN content = 2 THEN 'EqualityDeleteFile'
+      ELSE 'Unknown'
+    END AS ContentType,
+    SUM(file_size_in_bytes) AS SizeInBytes,
+    SUM(record_count) AS Records
+  FROM
+    iceberg_table$files
+  GROUP BY
+    ContentType;
+
+  +--------------------+-------------+---------+
+  | ContentType        | SizeInBytes | Records |
+  +--------------------+-------------+---------+
+  | EqualityDeleteFile |        1786 |       4 |
+  | DataFile           |        1981 |       5 |
+  | PositionDeleteFile |         809 |       1 |
+  +--------------------+-------------+---------+
+  ```
+
+### View Snapshot and Branch
+
+```sql
+SELECT
+  refs_data.snapshot_id,
+  snapshots.committed_at,
+  snapshots.operation,
+  ARRAY_SORT(refs_data.refs)
+FROM (
+  SELECT
+    snapshot_id,
+    ARRAY_AGG(CONCAT(type, ':', name)) AS refs
+  FROM
+    iceberg_table$refs
+  GROUP BY
+    snapshot_id
+) AS refs_data
+JOIN (
+  SELECT
+    snapshot_id,
+    committed_at,
+    operation
+  FROM
+    iceberg_table$snapshots
+) AS snapshots
+ON refs_data.snapshot_id = snapshots.snapshot_id
+ORDER BY
+  snapshots.committed_at;
+
++---------------------+----------------------------+-----------+-------------------------------------+
+| snapshot_id         | committed_at               | operation | 
ARRAY_SORT(refs_data.refs)          |
++---------------------+----------------------------+-----------+-------------------------------------+
+| 8272911997874079853 | 2025-07-10 15:27:07.177000 | append    | ["BRANCH:b1", 
"TAG:t1"]             |
+| 1325777059626757917 | 2025-07-10 15:27:07.530000 | append    | ["BRANCH:b2", 
"TAG:t2"]             |
+|   76492482642020578 | 2025-07-10 15:27:07.865000 | append    | ["BRANCH:b3", 
"TAG:t3"]             |
+| 1788715857849070138 | 2025-07-12 04:15:19.626000 | append    | 
["BRANCH:main", "TAG:t4", "TAG:t5"] |
++---------------------+----------------------------+-----------+-------------------------------------+
+```
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/iceberg-catalog.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/iceberg-catalog.md
index db885b657aa..f4bcd84fa61 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/iceberg-catalog.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/iceberg-catalog.md
@@ -756,12 +756,174 @@ DROP DATABASE [IF EXISTS] iceberg.iceberg_db;
 
   * 对象存储
 
-## 附录
+### Schema 变更
 
-### 版本更新记录
+自 3.1.0 版本开始，Doris 支持 Iceberg 表的 Schema 变更（Schema Change），可以通过 `ALTER TABLE` 
语句来修改表的 Schema。
 
-| Doris 版本 | 功能支持                           |
-| -------- | ------------------------------ |
-| 2.1.3    | 支持 ORC 文件格式，支持 Equality Delete |
-| 2.1.6    | 支持 DDL，DML                     |
+支持的 Schema 变更操作包括：
 
+* **修改列名称**
+
+通过 `RENAME COLUMN` 子句修改列名称，不支持修改嵌套类型中的列名称。
+
+```sql
+ALTER TABLE iceberg_table RENAME COLUMN old_col_name TO new_col_name;
+```
+
+* **添加一列**
+
+通过 `ADD COLUMN` 添加新列，新列会被添加到表的末尾，不支持为嵌套类型添加新列。
+
+在添加新列时，可以指定 nullable 属性、默认值和注释。
+
+```sql
+ALTER TABLE iceberg_table ADD COLUMN col_name col_type [nullable, [default 
default_value, [comment 'comment']]];
+```
+
+示例：
+
+```sql
+ALTER TABLE iceberg_table ADD COLUMN new_col STRING NOT NULL DEFAULT 
'default_value' COMMENT 'This is a new col';
+```
+
+* **添加多列**
+
+可以通过 `ADD COLUMN` 添加多列，新列会被添加到表的末尾，不支持为嵌套类型添加新列。
+
+每一列的语法和添加单列时一样。
+
+```sql
+ALTER TABLE iceberg_table ADD COLUMN (col_name1 col_type1 [nullable, [default 
default_value, [comment 'comment']]], col_name2 col_type2 [nullable, [default 
default_value, [comment 'comment']]] ...);
+```
+* **删除列**
+
+通过 `DROP COLUMN` 删除列，不支持删除嵌套类型中的列。
+
+```sql
+ALTER TABLE iceberg_table DROP COLUMN col_name;
+```
+
+* **修改列**
+
+通过 `MODIFY COLUMN` 语句修改列的属性，包括类型，nullable，默认值和注释。
+
+注意：修改列的属性时，所有没有被修改的属性也应该显式地指定为原来的值。
+
+```sql
+ALTER TABLE iceberg_table MODIFY COLUMN col_name col_type [nullable, [default 
default_value, [comment 'comment']]];
+```
+
+示例：
+
+```sql
+CREATE TABLE iceberg_table (
+    id INT,
+    name STRING
+);
+-- 修改 id 列的类型为 BIGINT，设置为 NOT NULL，默认值为 0，并添加注释
+ALTER TABLE iceberg_table MODIFY COLUMN id BIGINT NOT NULL DEFAULT 0 COMMENT 
'This is a modified id column';
+```
+
+* **重新排序**
+
+通过 `ORDER BY` 重新排序列，指定新的列顺序。
+
+```sql
+ALTER TABLE iceberg_table ORDER BY (col_name1, col_name2, ...);
+```
+
+## Iceberg 表优化
+
+### 查看数据文件分布
+
+通过以下 SQL 可以分析 Iceberg 表的数据分布和 delete 文件数量等，从而判断是否需要进行优化，如 Compaction。
+
+- 显示数据文件大小分布。可以判断是否有过多的小文件。
+
+  ```sql
+  SELECT
+    CASE
+      WHEN file_size_in_bytes BETWEEN 0 AND 8 * 1024 * 1024 THEN '0-8M'
+      WHEN file_size_in_bytes BETWEEN 8 * 1024 * 1024 + 1 AND 32 * 1024 * 1024 
THEN '8-32M'
+      WHEN file_size_in_bytes BETWEEN 2 * 1024 * 1024 + 1 AND 128 * 1024 * 
1024 THEN '32-128M'
+      WHEN file_size_in_bytes BETWEEN 128 * 1024 * 1024 + 1 AND 512 * 1024 * 
1024 THEN '128-512M'
+      WHEN file_size_in_bytes > 512 * 1024 * 1024 THEN '> 512M'
+      ELSE 'Unknown'
+    END AS SizeRange,
+    COUNT(*) AS FileNum
+  FROM store_sales$data_files
+  GROUP BY
+    SizeRange;
+
+  +-----------+---------+
+  | SizeRange | FileNum |
+  +-----------+---------+
+  | 0-8M      |       8 |
+  | 8-32M     |       6 |
+  +-----------+---------+
+  ```
+
+- 显示数据文件、Delete 文件的数量
+
+  ```sql
+  SELECT
+    CASE
+      WHEN content = 0 THEN 'DataFile'
+      WHEN content = 1 THEN 'PositionDeleteFile'
+      WHEN content = 2 THEN 'EqualityDeleteFile'
+      ELSE 'Unknown'
+    END AS ContentType,
+    SUM(file_size_in_bytes) AS SizeInBytes,
+    SUM(record_count) AS Records
+  FROM
+    iceberg_table$files
+  GROUP BY
+    ContentType;
+
+  +--------------------+-------------+---------+
+  | ContentType        | SizeInBytes | Records |
+  +--------------------+-------------+---------+
+  | EqualityDeleteFile |        1786 |       4 |
+  | DataFile           |        1981 |       5 |
+  | PositionDeleteFile |         809 |       1 |
+  +--------------------+-------------+---------+
+  ```
+
+### 查看快照和分支对应情况
+
+```sql
+SELECT
+  refs_data.snapshot_id,
+  snapshots.committed_at,
+  snapshots.operation,
+  ARRAY_SORT(refs_data.refs)
+FROM (
+  SELECT
+    snapshot_id,
+    ARRAY_AGG(CONCAT(type, ':', name)) AS refs
+  FROM
+    iceberg_table$refs
+  GROUP BY
+    snapshot_id
+) AS refs_data
+JOIN (
+  SELECT
+    snapshot_id,
+    committed_at,
+    operation
+  FROM
+    iceberg_table$snapshots
+) AS snapshots
+ON refs_data.snapshot_id = snapshots.snapshot_id
+ORDER BY
+  snapshots.committed_at;
+
++---------------------+----------------------------+-----------+-------------------------------------+
+| snapshot_id         | committed_at               | operation | 
ARRAY_SORT(refs_data.refs)          |
++---------------------+----------------------------+-----------+-------------------------------------+
+| 8272911997874079853 | 2025-07-10 15:27:07.177000 | append    | ["BRANCH:b1", 
"TAG:t1"]             |
+| 1325777059626757917 | 2025-07-10 15:27:07.530000 | append    | ["BRANCH:b2", 
"TAG:t2"]             |
+|   76492482642020578 | 2025-07-10 15:27:07.865000 | append    | ["BRANCH:b3", 
"TAG:t3"]             |
+| 1788715857849070138 | 2025-07-12 04:15:19.626000 | append    | 
["BRANCH:main", "TAG:t4", "TAG:t5"] |
++---------------------+----------------------------+-----------+-------------------------------------+
+```
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [doc](iceberg) support schema change ddl for iceberg (#2596)

Reply via email to