This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new fdfa2a7a48c [opt](iceberg) add new iceberg table actions (#3292)
fdfa2a7a48c is described below
commit fdfa2a7a48c121a3f9509bd918f7aeac7b86e8ed
Author: Mingyu Chen (Rayner) <[email protected]>
AuthorDate: Sat Jan 17 11:06:57 2026 +0800
[opt](iceberg) add new iceberg table actions (#3292)
## Versions
- [x] dev
- [x] 4.x
- [x] 3.x
- [x] 2.1
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
docs/lakehouse/catalogs/iceberg-catalog.mdx | 306 +++++++++++++--------
.../current/lakehouse/catalogs/iceberg-catalog.mdx | 206 ++++++++++----
.../lakehouse/catalogs/iceberg-catalog.mdx | 206 ++++++++++----
.../lakehouse/catalogs/iceberg-catalog.mdx | 206 ++++++++++----
.../lakehouse/catalogs/iceberg-catalog.mdx | 306 +++++++++++++--------
.../lakehouse/catalogs/iceberg-catalog.mdx | 306 +++++++++++++--------
6 files changed, 1032 insertions(+), 504 deletions(-)
diff --git a/docs/lakehouse/catalogs/iceberg-catalog.mdx
b/docs/lakehouse/catalogs/iceberg-catalog.mdx
index 5afe8b5a49f..3a74490f01c 100644
--- a/docs/lakehouse/catalogs/iceberg-catalog.mdx
+++ b/docs/lakehouse/catalogs/iceberg-catalog.mdx
@@ -2049,11 +2049,11 @@ Supported operations include:
## Iceberg Table Actions
-> This feature is supported since 4.0.2
+> This feature is supported starting from version 4.0.2. See specific version
requirements for each operation.
>
> This is an experimental feature.
-Doris supports executing specific actions on Iceberg tables through the `ALTER
TABLE EXECUTE` syntax. This syntax provides a unified interface for executing
various actions such as table optimization and snapshot management.
+Doris supports executing specific Iceberg table actions through the `ALTER
TABLE EXECUTE` syntax. This syntax provides a unified interface for executing
various operations such as table optimization and snapshot management.
```sql
ALTER TABLE [catalog.][database.]table_name
@@ -2061,167 +2061,251 @@ EXECUTE <action_name> ("key1" = "value1", "key2" =
"value2", ...)
[WHERE <condition>]
```
-* `action_name`: The name of the action to execute
-* `("key1" = "value1", ...)`: Action-related parameters provided as key-value
pairs. Parameter names must be wrapped in double quotes
+* `action_name`: Name of the operation to execute
+* `("key1" = "value1", ...)`: Operation-related parameters provided as
key-value pairs. Parameter names must be enclosed in double quotes
* `[WHERE <condition>]`: Optional WHERE condition to specify the partition or
data range for the operation (only supported by some operations)
-### rewrite_data_files
+### cherrypick_snapshot
+
+The `cherrypick_snapshot` operation merges changes from a specified snapshot
into the current table state, creating a new snapshot without modifying or
deleting the original snapshot.
-The `rewrite_data_files` operation rewrites data files in an Iceberg table to
implement small file compaction, optimizing query performance and storage
efficiency. This is a synchronized operation.
+> Supported version: 4.0.2+
**Syntax:**
```sql
ALTER TABLE [catalog.][database.]table_name
-EXECUTE rewrite_data_files ("key1" = "value1", "key2" = "value2", ...) [WHERE
<condition>]
+EXECUTE cherrypick_snapshot ("snapshot_id" = "<snapshot_id>")
```
**Parameters:**
-| Parameter Name | Type | Default Value | Description |
-| -------------- | ---- | ------------- | ----------- |
-| `target-file-size-bytes` | Long | 536870912 (512MB) | Target file size in
bytes for output files |
-| `min-file-size-bytes` | Long | target-file-size-bytes * 0.75 | Minimum file
size in bytes. Files smaller than this will be rewritten. If not specified,
defaults to 75% of target file size |
-| `max-file-size-bytes` | Long | target-file-size-bytes * 1.8 | Maximum file
size in bytes. Files larger than this will be split. If not specified, defaults
to 180% of target file size |
-| `min-input-files` | Integer | 5 | Minimum number of input files to trigger
rewrite. Rewrite only executes when the number of files to rewrite reaches this
threshold |
-| `rewrite-all` | Boolean | false | Whether to rewrite all files regardless of
file size |
-| `max-file-group-size-bytes` | Long | 107374182400 (100GB) | Maximum size in
bytes for a file group. Used to limit the amount of data processed by a single
rewrite task |
-| `delete-file-threshold` | Integer | Integer.MAX_VALUE | Minimum number of
delete files required to trigger rewrite |
-| `delete-ratio-threshold` | Double | 0.3 | Minimum ratio of delete records to
total records required to trigger rewrite (delete records / total records).
Range: 0.0 - 1.0 |
-| `output-spec-id` | Long | 2 | Partition specification ID for output files |
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
+| `snapshot_id` | Long | Yes | The snapshot ID to be merged |
**Return Value:**
-Executing `rewrite_data_files` returns a result set with the following 4
columns:
+Executing the `cherrypick_snapshot` operation returns a result set with the
following 2 columns:
| Column Name | Type | Description |
-| ----------- | ---- | ----------- |
-| `rewritten_data_files_count` | INT | Number of data files that were
rewritten |
-| `added_data_files_count` | INT | Number of new data files that were written |
-| `rewritten_bytes_count` | INT | Number of bytes that were rewritten |
-| `removed_delete_files_count` | BIGINT | Number of delete files that were
removed |
+| ---- | ---- | ---- |
+| `source_snapshot_id` | BIGINT | The merged snapshot ID |
+| `current_snapshot_id` | BIGINT | The snapshot ID newly created and set as
current after the merge operation |
-**Examples:**
+**Example:**
```sql
--- Rewrite data files with default parameters
-ALTER TABLE iceberg_db.iceberg_table EXECUTE rewrite_data_files();
-
--- Specify target file size and minimum input files
+-- Merge changes from snapshot 123456789 into the current table state
ALTER TABLE iceberg_db.iceberg_table
-EXECUTE rewrite_data_files (
- "target-file-size-bytes" = "134217728",
- "min-input-files" = "10"
-);
+EXECUTE cherrypick_snapshot ("snapshot_id" = "123456789");
+```
--- Use WHERE condition to rewrite specific partition data
-ALTER TABLE iceberg_db.iceberg_table
-EXECUTE rewrite_data_files (
- "target-file-size-bytes" = "268435456"
-)
-WHERE date_col = '2024-01-01';
+**Notes:**
--- Rewrite all files
+1. This operation does not support WHERE conditions
+2. The operation will fail if the specified snapshot does not exist
+3. The merge operation creates a new snapshot and does not delete the original
snapshot
+
+### fast_forward
+
+The `fast_forward` operation quickly advances the current snapshot of one
branch to the latest snapshot of another branch.
+
+> Supported version: 4.0.2+
+
+**Syntax:**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE fast_forward ("branch" = "<branch_name>", "to" = "<target_branch>")
+```
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
+| `branch` | String | Yes | Name of the branch to advance |
+| `to` | String | Yes | Target branch name; `branch` will be advanced to the
latest snapshot of this branch |
+
+**Return Value:**
+
+Executing the `fast_forward` operation returns a result set with the following
3 columns:
+
+| Column Name | Type | Description |
+| ---- | ---- | ---- |
+| `branch_updated` | STRING | The name of the branch that was advanced |
+| `previous_ref` | BIGINT | The snapshot ID the branch pointed to before
advancing (can be NULL) |
+| `updated_ref` | BIGINT | The snapshot ID the branch points to after
advancing |
+
+**Example:**
+
+```sql
+-- Advance the feature branch to the latest snapshot of the main branch
ALTER TABLE iceberg_db.iceberg_table
-EXECUTE rewrite_data_files (
- "rewrite-all" = "true",
- "target-file-size-bytes" = "536870912"
-);
+EXECUTE fast_forward ("branch" = "feature", "to" = "main");
```
**Notes:**
-1. The `rewrite_data_files` operation reads and rewrites data files, which
will generate additional I/O and computational overhead. Please allocate
cluster resources reasonably.
-2. Before execution, you can use the SQL in the [View Data File
Distribution](#view-data-file-distribution) section to evaluate whether a
rewrite operation is needed
-3. The WHERE condition can be used to limit the partition or data range for
rewriting. This condition will filter out files that do not contain data
matching the WHERE condition, thereby reducing the number of files and amount
of data to rewrite
-4. Before execution, you can use the SQL in the [Rewrite File Selection
Logic](#rewrite-file-selection-logic) section to calculate which files will be
rewritten
+1. This operation does not support WHERE conditions
+2. The operation will fail if the specified branch does not exist
+3. Only branches can be advanced, not tags
-### cherrypick_snapshot
+### publish_changes
-The `cherrypick_snapshot` operation merges changes from a specified snapshot
into the current table state, creating a new snapshot without modifying or
deleting the original snapshot.
+The `publish_changes` operation is the "Publish" step in the
Write-Audit-Publish (WAP) pattern. This process finds snapshots with a specific
`wap.id` attribute and adds them to the current table state. This allows users
to atomically make "staged" data visible after validation.
+
+> Supported version: 4.1.0+
**Syntax:**
```sql
ALTER TABLE [catalog.][database.]table_name
-EXECUTE cherrypick_snapshot ("snapshot_id" = "<snapshot_id>")
+EXECUTE publish_changes ("wap_id" = "<wap-id>")
```
**Parameters:**
-| Parameter Name | Type | Required | Description |
-| -------------- | ---- | -------- | ----------- |
-| `snapshot_id` | Long | Yes | The snapshot ID to merge |
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
+| `wap_id` | String | Yes | The `wap.id` of the snapshot to publish |
**Return Value:**
-Executing `cherrypick_snapshot` returns a result set with the following 2
columns:
+Executing the `publish_changes` operation returns a result set with the
following 2 columns:
| Column Name | Type | Description |
-| ----------- | ---- | ----------- |
-| `source_snapshot_id` | BIGINT | The snapshot ID whose changes were merged |
-| `current_snapshot_id` | BIGINT | The snapshot ID of the new snapshot created
and set as current after the merge operation |
+| ---- | ---- | ---- |
+| `previous_snapshot_id` | BIGINT | The snapshot ID before publishing |
+| `current_snapshot_id` | BIGINT | The new snapshot ID after publishing |
**Example:**
```sql
--- Merge changes from snapshot 123456789 into current table state
-ALTER TABLE iceberg_db.iceberg_table
-EXECUTE cherrypick_snapshot ("snapshot_id" = "123456789");
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE publish_changes ("wap_id" = "branch-123")
```
**Notes:**
1. This operation does not support WHERE conditions
-2. If the specified snapshot does not exist, the operation will fail
-3. The merge operation creates a new snapshot and does not delete the original
snapshot
+2. The operation will fail if the specified `wap.id` does not exist
-### fast_forward
-The `fast_forward` operation fast-forwards the current snapshot of one branch
to the latest snapshot of another branch.
+### rewrite_data_files
+
+The `rewrite_data_files` operation rewrites data files in an Iceberg table to
achieve small file compaction, optimizing query performance and storage
efficiency. This is a synchronous operation.
+
+> Supported version: 4.0.2+
**Syntax:**
```sql
ALTER TABLE [catalog.][database.]table_name
-EXECUTE fast_forward ("branch" = "<branch_name>", "to" = "<target_branch>")
+EXECUTE rewrite_data_files ("key1" = "value1", "key2" = "value2", ...) [WHERE
<condition>]
```
**Parameters:**
-| Parameter Name | Type | Required | Description |
-| -------------- | ---- | -------- | ----------- |
-| `branch` | String | Yes | Name of the branch to fast-forward |
-| `to` | String | Yes | Target branch name. The `branch` will be
fast-forwarded to the latest snapshot of this branch |
+| Parameter | Type | Default Value | Description |
+| -------- | ---- | ------ | ---- |
+| `target-file-size-bytes` | Long | 536870912 (512MB) | Target file size in
bytes. The target size for output files |
+| `min-file-size-bytes` | Long | target-file-size-bytes * 0.75 | Minimum file
size in bytes. Files smaller than this will be rewritten. If not specified,
defaults to 75% of target file size |
+| `max-file-size-bytes` | Long | target-file-size-bytes * 1.8 | Maximum file
size in bytes. Files larger than this will be split. If not specified, defaults
to 180% of target file size |
+| `min-input-files` | Integer | 5 | Minimum number of input files to trigger
rewriting. Rewriting will only execute when the number of files to rewrite
reaches this threshold |
+| `rewrite-all` | Boolean | false | Whether to rewrite all files regardless of
file size |
+| `max-file-group-size-bytes` | Long | 107374182400 (100GB) | Maximum size of
a single file group in bytes. Used to limit the amount of data processed by a
single rewrite task |
+| `delete-file-threshold` | Integer | Integer.MAX_VALUE | Minimum number of
delete files required to trigger rewriting |
+| `delete-ratio-threshold` | Double | 0.3 | Minimum ratio of deleted records
required to trigger rewriting (deleted records / total records). Range: 0.0 -
1.0 |
+| `output-spec-id` | Long | 2 | Partition spec ID for output files |
**Return Value:**
-Executing `fast_forward` returns a result set with the following 3 columns:
+Executing the `rewrite_data_files` operation returns a result set with the
following 4 columns:
| Column Name | Type | Description |
-| ----------- | ---- | ----------- |
-| `branch_updated` | STRING | Name of the branch that was fast-forwarded |
-| `previous_ref` | BIGINT | Snapshot ID the branch was pointing to before the
fast-forward operation (can be NULL) |
-| `updated_ref` | BIGINT | Snapshot ID the branch is pointing to after the
fast-forward operation |
+| ---- | ---- | ---- |
+| `rewritten_data_files_count` | INT | Number of data files rewritten |
+| `added_data_files_count` | INT | Number of new data files added |
+| `rewritten_bytes_count` | INT | Number of bytes rewritten |
+| `removed_delete_files_count` | BIGINT | Number of delete files removed |
-**Example:**
+**Examples:**
```sql
--- Fast-forward feature branch to the latest snapshot of main branch
+-- Rewrite data files with default parameters
+ALTER TABLE iceberg_db.iceberg_table EXECUTE rewrite_data_files();
+
+-- Specify target file size and minimum input files
ALTER TABLE iceberg_db.iceberg_table
-EXECUTE fast_forward ("branch" = "feature", "to" = "main");
+EXECUTE rewrite_data_files (
+ "target-file-size-bytes" = "134217728",
+ "min-input-files" = "10"
+);
+
+-- Rewrite data in specific partitions using WHERE condition
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE rewrite_data_files (
+ "target-file-size-bytes" = "268435456"
+)
+WHERE date_col = '2024-01-01';
+
+-- Rewrite all files
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE rewrite_data_files (
+ "rewrite-all" = "true",
+ "target-file-size-bytes" = "536870912"
+);
```
**Notes:**
-1. This operation does not support WHERE conditions
-2. If the specified branch does not exist, the operation will fail
-3. Only branches can be fast-forwarded, not tags
+1. The `rewrite_data_files` operation reads and rewrites data files, which
incurs additional I/O and computing overhead. Please allocate cluster resources
appropriately.
+2. Before execution, you can use SQL from the [View Data File
Distribution](#view-data-file-distribution) section to evaluate whether
rewriting is necessary
+3. WHERE conditions can be used to limit the partitions or data range for
rewriting. This condition filters out files that don't contain data matching
the WHERE condition, reducing the number of files and amount of data to rewrite
+4. Before execution, you can use SQL from the [Rewrite File Selection
Logic](#rewrite-file-selection-logic) section to calculate which files will be
rewritten
+
+### rewrite_manifests
+
+The `rewrite_manifests` operation allows users to optimize Iceberg table
metadata by rewriting manifest files, improving query performance and reducing
metadata overhead.
+
+This addresses manifest file optimization issues in large Iceberg tables, as a
large number of small manifest files can impact query planning performance.
+
+> Supported version: 4.1.0+
+
+**Syntax:**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE rewrite_manifests();
+```
+
+**Parameters:**
+
+No parameters
+
+**Return Value:**
+
+Executing the `rewrite_manifests` operation returns a result set with the
following 2 columns:
+
+| Column Name | Type | Description |
+| ---- | ---- | ---- |
+| `rewritten_manifests_count` | BIGINT | Number of files rewritten |
+| `total_data_manifests_count` | BIGINT | Number of data manifest files
rewritten |
+
+**Example:**
+
+```sql
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE rewrite_manifests();
+```
### rollback_to_snapshot
The `rollback_to_snapshot` operation rolls back an Iceberg table to a
specified snapshot.
+> Supported version: 4.0.2+
+
**Syntax:**
```sql
@@ -2231,23 +2315,23 @@ EXECUTE rollback_to_snapshot ("snapshot_id" =
"<snapshot_id>")
**Parameters:**
-| Parameter Name | Type | Required | Description |
-| -------------- | ---- | -------- | ----------- |
-| `snapshot_id` | Long | Yes | The snapshot ID to rollback to |
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
+| `snapshot_id` | Long | Yes | The snapshot ID to roll back to |
**Return Value:**
-Executing `rollback_to_snapshot` returns a result set with the following 2
columns:
+Executing the `rollback_to_snapshot` operation returns a result set with the
following 2 columns:
| Column Name | Type | Description |
-| ----------- | ---- | ----------- |
-| `previous_snapshot_id` | BIGINT | The snapshot ID that was current before
the rollback operation |
-| `current_snapshot_id` | BIGINT | The snapshot ID that is set as current
after rolling back to the specified snapshot |
+| ---- | ---- | ---- |
+| `previous_snapshot_id` | BIGINT | The ID of the current snapshot before
rollback |
+| `current_snapshot_id` | BIGINT | The snapshot ID set as current after
rollback |
**Example:**
```sql
--- Rollback to snapshot 987654321
+-- Roll back to snapshot 987654321
ALTER TABLE iceberg_db.iceberg_table
EXECUTE rollback_to_snapshot ("snapshot_id" = "987654321");
```
@@ -2255,12 +2339,14 @@ EXECUTE rollback_to_snapshot ("snapshot_id" =
"987654321");
**Notes:**
1. This operation does not support WHERE conditions
-2. If the specified snapshot does not exist, the operation will fail
+2. The operation will fail if the specified snapshot does not exist
3. If the current snapshot is already the target snapshot, the operation
returns directly without creating a new snapshot
### rollback_to_timestamp
-The `rollback_to_timestamp` operation rolls back an Iceberg table to the
snapshot at a specified timestamp.
+The `rollback_to_timestamp` operation rolls back an Iceberg table to the
snapshot at a specified point in time.
+
+> Supported version: 4.0.2+
**Syntax:**
@@ -2271,27 +2357,27 @@ EXECUTE rollback_to_timestamp ("timestamp" =
"<timestamp>")
**Parameters:**
-| Parameter Name | Type | Required | Description |
-| -------------- | ---- | -------- | ----------- |
-| `timestamp` | String | Yes | The timestamp to rollback to. Supports two
formats:<br />1. ISO datetime format: `yyyy-MM-dd HH:mm:ss.SSS` (e.g.,
`2024-01-01 10:30:00.000`)<br />2. Milliseconds timestamp (e.g.,
`1704067200000`) |
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
+| `timestamp` | String | Yes | The point in time to roll back to. Supports two
formats:<br />1. ISO datetime format: `yyyy-MM-dd HH:mm:ss.SSS` (e.g.,
`2024-01-01 10:30:00.000`)<br/>2. Millisecond timestamp (e.g., `1704067200000`)
|
**Return Value:**
-Executing `rollback_to_timestamp` returns a result set with the following 2
columns:
+Executing the `rollback_to_timestamp` operation returns a result set with the
following 2 columns:
| Column Name | Type | Description |
-| ----------- | ---- | ----------- |
-| `previous_snapshot_id` | BIGINT | The snapshot ID that was current before
the rollback operation |
-| `current_snapshot_id` | BIGINT | The snapshot ID that was current at the
specified timestamp, now set as current |
+| ---- | ---- | ---- |
+| `previous_snapshot_id` | BIGINT | The ID of the current snapshot before
rollback |
+| `current_snapshot_id` | BIGINT | The snapshot ID at the specified point in
time, now set as current |
**Examples:**
```sql
--- Rollback to specified timestamp (ISO format)
+-- Roll back to specified point in time (ISO format)
ALTER TABLE iceberg_db.iceberg_table
EXECUTE rollback_to_timestamp ("timestamp" = "2024-01-01 10:30:00.000");
--- Rollback to specified timestamp (timestamp format)
+-- Roll back to specified point in time (timestamp format)
ALTER TABLE iceberg_db.iceberg_table
EXECUTE rollback_to_timestamp ("timestamp" = "1704067200000");
```
@@ -2299,13 +2385,15 @@ EXECUTE rollback_to_timestamp ("timestamp" =
"1704067200000");
**Notes:**
1. This operation does not support WHERE conditions
-2. The `timestamp` parameter supports ISO datetime format (`yyyy-MM-dd
HH:mm:ss.SSS`) or milliseconds timestamp format
-3. If there is no snapshot corresponding to the specified timestamp, the
operation will fail
+2. The `timestamp` parameter supports ISO datetime format (`yyyy-MM-dd
HH:mm:ss.SSS`) or millisecond timestamp format
+3. The operation will fail if there is no corresponding snapshot at the
specified point in time
### set_current_snapshot
The `set_current_snapshot` operation sets the current snapshot of an Iceberg
table to a specified snapshot ID or reference (branch or tag).
+> Supported version: 4.0.2+
+
**Syntax:**
```sql
@@ -2315,19 +2403,19 @@ EXECUTE set_current_snapshot ("snapshot_id" =
"<snapshot_id>" | "ref" = "<ref_na
**Parameters:**
-| Parameter Name | Type | Required | Description |
-| -------------- | ---- | -------- | ----------- |
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
| `snapshot_id` | Long | Either one | The snapshot ID to set as current |
| `ref` | String | Either one | The reference name (branch or tag) to set as
current |
**Return Value:**
-Executing `set_current_snapshot` returns a result set with the following 2
columns:
+Executing the `set_current_snapshot` operation returns a result set with the
following 2 columns:
| Column Name | Type | Description |
-| ----------- | ---- | ----------- |
-| `previous_snapshot_id` | BIGINT | The snapshot ID that was current before
setting the new current snapshot |
-| `current_snapshot_id` | BIGINT | The snapshot ID that is now set as current
(from `snapshot_id` parameter or resolved from `ref` parameter) |
+| ---- | ---- | ---- |
+| `previous_snapshot_id` | BIGINT | The ID of the current snapshot before the
operation |
+| `current_snapshot_id` | BIGINT | The snapshot ID after the operation (from
`snapshot_id` parameter or resolved from `ref` parameter) |
**Examples:**
@@ -2349,7 +2437,7 @@ EXECUTE set_current_snapshot ("ref" = "v1.0");
1. This operation does not support WHERE conditions
2. The `snapshot_id` and `ref` parameters are mutually exclusive; only one can
be specified
-3. If the specified snapshot ID or reference does not exist, the operation
will fail
+3. The operation will fail if the specified snapshot ID or reference does not
exist
4. If the current snapshot is already the target snapshot, the operation
returns directly without creating a new snapshot
## Iceberg Table Optimization
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/iceberg-catalog.mdx
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/iceberg-catalog.mdx
index 86c73128cae..94c38890d48 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/iceberg-catalog.mdx
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/catalogs/iceberg-catalog.mdx
@@ -2063,7 +2063,7 @@ DROP DATABASE [IF EXISTS] iceberg.iceberg_db;
## Iceberg 表操作
-> 该功能自 4.0.2 版本开始支持。
+> 该功能自 4.0.2 版本开始支持。具体支持版本参见各个操作的说明
>
> 该功能是实验性功能。
@@ -2079,10 +2079,139 @@ EXECUTE <action_name> ("key1" = "value1", "key2" =
"value2", ...)
* `("key1" = "value1", ...)`:操作相关的参数,以键值对形式提供。参数名称需要使用双引号包裹
* `[WHERE <condition>]`:可选的 WHERE 条件,用于指定操作的分区或数据范围(仅部分操作支持)
+### cherrypick_snapshot
+
+`cherrypick_snapshot` 操作用于将指定快照的变更合并到当前表状态,创建一个新的快照而不修改或删除原始快照。
+
+> 支持版本:4.0.2+
+
+**语法:**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE cherrypick_snapshot ("snapshot_id" = "<snapshot_id>")
+```
+
+**参数说明:**
+
+| 参数名称 | 类型 | 必填 | 描述 |
+| -------- | ---- | ---- | ---- |
+| `snapshot_id` | Long | 是 | 要合并的快照 ID |
+
+**返回值:**
+
+执行 `cherrypick_snapshot` 操作会返回一个结果集,包含以下 2 列:
+
+| 列名 | 类型 | 描述 |
+| ---- | ---- | ---- |
+| `source_snapshot_id` | BIGINT | 被合并的快照 ID |
+| `current_snapshot_id` | BIGINT | 合并操作后新创建并设置为当前快照的快照 ID |
+
+**示例:**
+
+```sql
+-- 将快照 123456789 的变更合并到当前表状态
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE cherrypick_snapshot ("snapshot_id" = "123456789");
+```
+
+**注意事项:**
+
+1. 该操作不支持 WHERE 条件
+2. 如果指定的快照不存在,操作会失败
+3. 合并操作会创建一个新的快照,不会删除原始快照
+
+### fast_forward
+
+`fast_forward` 操作用于将一个分支的当前快照快速推进到另一个分支的最新快照。
+
+> 支持版本:4.0.2+
+
+**语法:**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE fast_forward ("branch" = "<branch_name>", "to" = "<target_branch>")
+```
+
+**参数说明:**
+
+| 参数名称 | 类型 | 必填 | 描述 |
+| -------- | ---- | ---- | ---- |
+| `branch` | String | 是 | 要推进的分支名称 |
+| `to` | String | 是 | 目标分支名称,会将 `branch` 推进到该分支的最新快照 |
+
+**返回值:**
+
+执行 `fast_forward` 操作会返回一个结果集,包含以下 3 列:
+
+| 列名 | 类型 | 描述 |
+| ---- | ---- | ---- |
+| `branch_updated` | STRING | 被推进的分支名称 |
+| `previous_ref` | BIGINT | 推进操作前分支指向的快照 ID(可为 NULL) |
+| `updated_ref` | BIGINT | 推进操作后分支指向的快照 ID |
+
+**示例:**
+
+```sql
+-- 将 feature 分支推进到 main 分支的最新快照
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE fast_forward ("branch" = "feature", "to" = "main");
+```
+
+**注意事项:**
+
+1. 该操作不支持 WHERE 条件
+2. 如果指定的分支不存在,操作会失败
+3. 只能推进分支,不能推进标签(Tag)
+
+### publish_changes
+
+`publish_changes` 操作是“写入 - 审核 - 发布” (Write-Audit-Publish)
模式中的“发布”步骤。该过程会查找带有特定 `wap.id` 属性的快照,并将其添加到当前表状态中。这样,用户就可以在验证后原子性地使“暂存”数据可见。
+
+> 支持版本:4.1.0+
+
+**语法:**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE publish_changes ("wap_id" = "<wap-id>")
+```
+
+**参数说明:**
+
+| 参数名称 | 类型 | 必填 | 描述 |
+| -------- | ---- | ---- | ---- |
+| `wap_id` | String | 是 | 要进行发布的快照的 `wap.id` |
+
+**返回值:**
+
+执行 `publish_changes` 操作会返回一个结果集,包含以下 2 列:
+
+| 列名 | 类型 | 描述 |
+| ---- | ---- | ---- |
+| `previous_snapshot_id` | BIGINT | 发布之前的快照 ID |
+| `current_snapshot_id` | BIGINT | 发布之后的新的快照 ID |
+
+**示例:**
+
+```sql
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE publish_changes ("wap_id" = "branch-123")
+```
+
+**注意事项:**
+
+1. 该操作不支持 WHERE 条件
+2. 如果指定的 `wap.id` 不存在,操作会失败
+
+
### rewrite_data_files
`rewrite_data_files` 操作用于对 Iceberg
表进行数据文件重写,实现小文件合并(Compaction)功能,以优化查询性能和存储效率。该操作是一个同步操作。
+> 支持版本:4.0.2+
+
**语法:**
```sql
@@ -2150,92 +2279,47 @@ EXECUTE rewrite_data_files (
3. WHERE 条件可用于限制重写的分区或数据范围,这个条件会过滤掉那些不包含 符合 WHERE 条件的数据的文件,从而减少重写的文件数量和数据量
4. 执行前可以通过[重写文件选择逻辑](#重写文件选择逻辑)章节中的 SQL 来计算哪些文件会被重写
-### cherrypick_snapshot
-
-`cherrypick_snapshot` 操作用于将指定快照的变更合并到当前表状态,创建一个新的快照而不修改或删除原始快照。
-
-**语法:**
-
-```sql
-ALTER TABLE [catalog.][database.]table_name
-EXECUTE cherrypick_snapshot ("snapshot_id" = "<snapshot_id>")
-```
-
-**参数说明:**
-
-| 参数名称 | 类型 | 必填 | 描述 |
-| -------- | ---- | ---- | ---- |
-| `snapshot_id` | Long | 是 | 要合并的快照 ID |
-
-**返回值:**
-
-执行 `cherrypick_snapshot` 操作会返回一个结果集,包含以下 2 列:
-
-| 列名 | 类型 | 描述 |
-| ---- | ---- | ---- |
-| `source_snapshot_id` | BIGINT | 被合并的快照 ID |
-| `current_snapshot_id` | BIGINT | 合并操作后新创建并设置为当前快照的快照 ID |
-
-**示例:**
-
-```sql
--- 将快照 123456789 的变更合并到当前表状态
-ALTER TABLE iceberg_db.iceberg_table
-EXECUTE cherrypick_snapshot ("snapshot_id" = "123456789");
-```
+### rewrite_manifests
-**注意事项:**
+`rewrite_manifests` 操作允许用户通过重写清单文件(Manifest File)来优化 Iceberg
表元数据,从而提高查询性能并减少元数据开销。
-1. 该操作不支持 WHERE 条件
-2. 如果指定的快照不存在,操作会失败
-3. 合并操作会创建一个新的快照,不会删除原始快照
-
-### fast_forward
+这解决了大型 Iceberg 表中清单文件优化的问题,因为大量小型清单文件可能会影响查询计划性能。
-`fast_forward` 操作用于将一个分支的当前快照快速推进到另一个分支的最新快照。
+> 支持版本:4.1.0+
**语法:**
```sql
ALTER TABLE [catalog.][database.]table_name
-EXECUTE fast_forward ("branch" = "<branch_name>", "to" = "<target_branch>")
+EXECUTE rewrite_manifests();
```
**参数说明:**
-| 参数名称 | 类型 | 必填 | 描述 |
-| -------- | ---- | ---- | ---- |
-| `branch` | String | 是 | 要推进的分支名称 |
-| `to` | String | 是 | 目标分支名称,会将 `branch` 推进到该分支的最新快照 |
+无参数
**返回值:**
-执行 `fast_forward` 操作会返回一个结果集,包含以下 3 列:
+执行 `rewrite_manifests` 操作会返回一个结果集,包含以下 2 列:
| 列名 | 类型 | 描述 |
| ---- | ---- | ---- |
-| `branch_updated` | STRING | 被推进的分支名称 |
-| `previous_ref` | BIGINT | 推进操作前分支指向的快照 ID(可为 NULL) |
-| `updated_ref` | BIGINT | 推进操作后分支指向的快照 ID |
+| `rewritten_manifests_count` | BIGINT | 被重写的文件数量 |
+| `total_data_manifests_count` | BIGINT | 被重写的数据清单文件数量 |
**示例:**
```sql
--- 将 feature 分支推进到 main 分支的最新快照
-ALTER TABLE iceberg_db.iceberg_table
-EXECUTE fast_forward ("branch" = "feature", "to" = "main");
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE rewrite_manifests();
```
-**注意事项:**
-
-1. 该操作不支持 WHERE 条件
-2. 如果指定的分支不存在,操作会失败
-3. 只能推进分支,不能推进标签(Tag)
-
### rollback_to_snapshot
`rollback_to_snapshot` 操作用于将 Iceberg 表回滚到指定的快照。
+> 支持版本:4.0.2+
+
**语法:**
```sql
@@ -2276,6 +2360,8 @@ EXECUTE rollback_to_snapshot ("snapshot_id" =
"987654321");
`rollback_to_timestamp` 操作用于将 Iceberg 表回滚到指定时间点的快照。
+> 支持版本:4.0.2+
+
**语法:**
```sql
@@ -2320,6 +2406,8 @@ EXECUTE rollback_to_timestamp ("timestamp" =
"1704067200000");
`set_current_snapshot` 操作用于将 Iceberg 表的当前快照设置为指定的快照 ID 或引用(分支或标签)。
+> 支持版本:4.0.2+
+
**语法:**
```sql
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/catalogs/iceberg-catalog.mdx
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/catalogs/iceberg-catalog.mdx
index 86c73128cae..94c38890d48 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/catalogs/iceberg-catalog.mdx
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/lakehouse/catalogs/iceberg-catalog.mdx
@@ -2063,7 +2063,7 @@ DROP DATABASE [IF EXISTS] iceberg.iceberg_db;
## Iceberg 表操作
-> 该功能自 4.0.2 版本开始支持。
+> 该功能自 4.0.2 版本开始支持。具体支持版本参见各个操作的说明
>
> 该功能是实验性功能。
@@ -2079,10 +2079,139 @@ EXECUTE <action_name> ("key1" = "value1", "key2" =
"value2", ...)
* `("key1" = "value1", ...)`:操作相关的参数,以键值对形式提供。参数名称需要使用双引号包裹
* `[WHERE <condition>]`:可选的 WHERE 条件,用于指定操作的分区或数据范围(仅部分操作支持)
+### cherrypick_snapshot
+
+`cherrypick_snapshot` 操作用于将指定快照的变更合并到当前表状态,创建一个新的快照而不修改或删除原始快照。
+
+> 支持版本:4.0.2+
+
+**语法:**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE cherrypick_snapshot ("snapshot_id" = "<snapshot_id>")
+```
+
+**参数说明:**
+
+| 参数名称 | 类型 | 必填 | 描述 |
+| -------- | ---- | ---- | ---- |
+| `snapshot_id` | Long | 是 | 要合并的快照 ID |
+
+**返回值:**
+
+执行 `cherrypick_snapshot` 操作会返回一个结果集,包含以下 2 列:
+
+| 列名 | 类型 | 描述 |
+| ---- | ---- | ---- |
+| `source_snapshot_id` | BIGINT | 被合并的快照 ID |
+| `current_snapshot_id` | BIGINT | 合并操作后新创建并设置为当前快照的快照 ID |
+
+**示例:**
+
+```sql
+-- 将快照 123456789 的变更合并到当前表状态
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE cherrypick_snapshot ("snapshot_id" = "123456789");
+```
+
+**注意事项:**
+
+1. 该操作不支持 WHERE 条件
+2. 如果指定的快照不存在,操作会失败
+3. 合并操作会创建一个新的快照,不会删除原始快照
+
+### fast_forward
+
+`fast_forward` 操作用于将一个分支的当前快照快速推进到另一个分支的最新快照。
+
+> 支持版本:4.0.2+
+
+**语法:**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE fast_forward ("branch" = "<branch_name>", "to" = "<target_branch>")
+```
+
+**参数说明:**
+
+| 参数名称 | 类型 | 必填 | 描述 |
+| -------- | ---- | ---- | ---- |
+| `branch` | String | 是 | 要推进的分支名称 |
+| `to` | String | 是 | 目标分支名称,会将 `branch` 推进到该分支的最新快照 |
+
+**返回值:**
+
+执行 `fast_forward` 操作会返回一个结果集,包含以下 3 列:
+
+| 列名 | 类型 | 描述 |
+| ---- | ---- | ---- |
+| `branch_updated` | STRING | 被推进的分支名称 |
+| `previous_ref` | BIGINT | 推进操作前分支指向的快照 ID(可为 NULL) |
+| `updated_ref` | BIGINT | 推进操作后分支指向的快照 ID |
+
+**示例:**
+
+```sql
+-- 将 feature 分支推进到 main 分支的最新快照
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE fast_forward ("branch" = "feature", "to" = "main");
+```
+
+**注意事项:**
+
+1. 该操作不支持 WHERE 条件
+2. 如果指定的分支不存在,操作会失败
+3. 只能推进分支,不能推进标签(Tag)
+
+### publish_changes
+
+`publish_changes` 操作是“写入 - 审核 - 发布” (Write-Audit-Publish)
模式中的“发布”步骤。该过程会查找带有特定 `wap.id` 属性的快照,并将其添加到当前表状态中。这样,用户就可以在验证后原子性地使“暂存”数据可见。
+
+> 支持版本:4.1.0+
+
+**语法:**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE publish_changes ("wap_id" = "<wap-id>")
+```
+
+**参数说明:**
+
+| 参数名称 | 类型 | 必填 | 描述 |
+| -------- | ---- | ---- | ---- |
+| `wap_id` | String | 是 | 要进行发布的快照的 `wap.id` |
+
+**返回值:**
+
+执行 `publish_changes` 操作会返回一个结果集,包含以下 2 列:
+
+| 列名 | 类型 | 描述 |
+| ---- | ---- | ---- |
+| `previous_snapshot_id` | BIGINT | 发布之前的快照 ID |
+| `current_snapshot_id` | BIGINT | 发布之后的新的快照 ID |
+
+**示例:**
+
+```sql
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE publish_changes ("wap_id" = "branch-123")
+```
+
+**注意事项:**
+
+1. 该操作不支持 WHERE 条件
+2. 如果指定的 `wap.id` 不存在,操作会失败
+
+
### rewrite_data_files
`rewrite_data_files` 操作用于对 Iceberg
表进行数据文件重写,实现小文件合并(Compaction)功能,以优化查询性能和存储效率。该操作是一个同步操作。
+> 支持版本:4.0.2+
+
**语法:**
```sql
@@ -2150,92 +2279,47 @@ EXECUTE rewrite_data_files (
3. WHERE 条件可用于限制重写的分区或数据范围,这个条件会过滤掉那些不包含 符合 WHERE 条件的数据的文件,从而减少重写的文件数量和数据量
4. 执行前可以通过[重写文件选择逻辑](#重写文件选择逻辑)章节中的 SQL 来计算哪些文件会被重写
-### cherrypick_snapshot
-
-`cherrypick_snapshot` 操作用于将指定快照的变更合并到当前表状态,创建一个新的快照而不修改或删除原始快照。
-
-**语法:**
-
-```sql
-ALTER TABLE [catalog.][database.]table_name
-EXECUTE cherrypick_snapshot ("snapshot_id" = "<snapshot_id>")
-```
-
-**参数说明:**
-
-| 参数名称 | 类型 | 必填 | 描述 |
-| -------- | ---- | ---- | ---- |
-| `snapshot_id` | Long | 是 | 要合并的快照 ID |
-
-**返回值:**
-
-执行 `cherrypick_snapshot` 操作会返回一个结果集,包含以下 2 列:
-
-| 列名 | 类型 | 描述 |
-| ---- | ---- | ---- |
-| `source_snapshot_id` | BIGINT | 被合并的快照 ID |
-| `current_snapshot_id` | BIGINT | 合并操作后新创建并设置为当前快照的快照 ID |
-
-**示例:**
-
-```sql
--- 将快照 123456789 的变更合并到当前表状态
-ALTER TABLE iceberg_db.iceberg_table
-EXECUTE cherrypick_snapshot ("snapshot_id" = "123456789");
-```
+### rewrite_manifests
-**注意事项:**
+`rewrite_manifests` 操作允许用户通过重写清单文件(Manifest File)来优化 Iceberg
表元数据,从而提高查询性能并减少元数据开销。
-1. 该操作不支持 WHERE 条件
-2. 如果指定的快照不存在,操作会失败
-3. 合并操作会创建一个新的快照,不会删除原始快照
-
-### fast_forward
+这解决了大型 Iceberg 表中清单文件优化的问题,因为大量小型清单文件可能会影响查询计划性能。
-`fast_forward` 操作用于将一个分支的当前快照快速推进到另一个分支的最新快照。
+> 支持版本:4.1.0+
**语法:**
```sql
ALTER TABLE [catalog.][database.]table_name
-EXECUTE fast_forward ("branch" = "<branch_name>", "to" = "<target_branch>")
+EXECUTE rewrite_manifests();
```
**参数说明:**
-| 参数名称 | 类型 | 必填 | 描述 |
-| -------- | ---- | ---- | ---- |
-| `branch` | String | 是 | 要推进的分支名称 |
-| `to` | String | 是 | 目标分支名称,会将 `branch` 推进到该分支的最新快照 |
+无参数
**返回值:**
-执行 `fast_forward` 操作会返回一个结果集,包含以下 3 列:
+执行 `rewrite_manifests` 操作会返回一个结果集,包含以下 2 列:
| 列名 | 类型 | 描述 |
| ---- | ---- | ---- |
-| `branch_updated` | STRING | 被推进的分支名称 |
-| `previous_ref` | BIGINT | 推进操作前分支指向的快照 ID(可为 NULL) |
-| `updated_ref` | BIGINT | 推进操作后分支指向的快照 ID |
+| `rewritten_manifests_count` | BIGINT | 被重写的文件数量 |
+| `total_data_manifests_count` | BIGINT | 被重写的数据清单文件数量 |
**示例:**
```sql
--- 将 feature 分支推进到 main 分支的最新快照
-ALTER TABLE iceberg_db.iceberg_table
-EXECUTE fast_forward ("branch" = "feature", "to" = "main");
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE rewrite_manifests();
```
-**注意事项:**
-
-1. 该操作不支持 WHERE 条件
-2. 如果指定的分支不存在,操作会失败
-3. 只能推进分支,不能推进标签(Tag)
-
### rollback_to_snapshot
`rollback_to_snapshot` 操作用于将 Iceberg 表回滚到指定的快照。
+> 支持版本:4.0.2+
+
**语法:**
```sql
@@ -2276,6 +2360,8 @@ EXECUTE rollback_to_snapshot ("snapshot_id" =
"987654321");
`rollback_to_timestamp` 操作用于将 Iceberg 表回滚到指定时间点的快照。
+> 支持版本:4.0.2+
+
**语法:**
```sql
@@ -2320,6 +2406,8 @@ EXECUTE rollback_to_timestamp ("timestamp" =
"1704067200000");
`set_current_snapshot` 操作用于将 Iceberg 表的当前快照设置为指定的快照 ID 或引用(分支或标签)。
+> 支持版本:4.0.2+
+
**语法:**
```sql
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/catalogs/iceberg-catalog.mdx
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/catalogs/iceberg-catalog.mdx
index 86c73128cae..94c38890d48 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/catalogs/iceberg-catalog.mdx
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/lakehouse/catalogs/iceberg-catalog.mdx
@@ -2063,7 +2063,7 @@ DROP DATABASE [IF EXISTS] iceberg.iceberg_db;
## Iceberg 表操作
-> 该功能自 4.0.2 版本开始支持。
+> 该功能自 4.0.2 版本开始支持。具体支持版本参见各个操作的说明
>
> 该功能是实验性功能。
@@ -2079,10 +2079,139 @@ EXECUTE <action_name> ("key1" = "value1", "key2" =
"value2", ...)
* `("key1" = "value1", ...)`:操作相关的参数,以键值对形式提供。参数名称需要使用双引号包裹
* `[WHERE <condition>]`:可选的 WHERE 条件,用于指定操作的分区或数据范围(仅部分操作支持)
+### cherrypick_snapshot
+
+`cherrypick_snapshot` 操作用于将指定快照的变更合并到当前表状态,创建一个新的快照而不修改或删除原始快照。
+
+> 支持版本:4.0.2+
+
+**语法:**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE cherrypick_snapshot ("snapshot_id" = "<snapshot_id>")
+```
+
+**参数说明:**
+
+| 参数名称 | 类型 | 必填 | 描述 |
+| -------- | ---- | ---- | ---- |
+| `snapshot_id` | Long | 是 | 要合并的快照 ID |
+
+**返回值:**
+
+执行 `cherrypick_snapshot` 操作会返回一个结果集,包含以下 2 列:
+
+| 列名 | 类型 | 描述 |
+| ---- | ---- | ---- |
+| `source_snapshot_id` | BIGINT | 被合并的快照 ID |
+| `current_snapshot_id` | BIGINT | 合并操作后新创建并设置为当前快照的快照 ID |
+
+**示例:**
+
+```sql
+-- 将快照 123456789 的变更合并到当前表状态
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE cherrypick_snapshot ("snapshot_id" = "123456789");
+```
+
+**注意事项:**
+
+1. 该操作不支持 WHERE 条件
+2. 如果指定的快照不存在,操作会失败
+3. 合并操作会创建一个新的快照,不会删除原始快照
+
+### fast_forward
+
+`fast_forward` 操作用于将一个分支的当前快照快速推进到另一个分支的最新快照。
+
+> 支持版本:4.0.2+
+
+**语法:**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE fast_forward ("branch" = "<branch_name>", "to" = "<target_branch>")
+```
+
+**参数说明:**
+
+| 参数名称 | 类型 | 必填 | 描述 |
+| -------- | ---- | ---- | ---- |
+| `branch` | String | 是 | 要推进的分支名称 |
+| `to` | String | 是 | 目标分支名称,会将 `branch` 推进到该分支的最新快照 |
+
+**返回值:**
+
+执行 `fast_forward` 操作会返回一个结果集,包含以下 3 列:
+
+| 列名 | 类型 | 描述 |
+| ---- | ---- | ---- |
+| `branch_updated` | STRING | 被推进的分支名称 |
+| `previous_ref` | BIGINT | 推进操作前分支指向的快照 ID(可为 NULL) |
+| `updated_ref` | BIGINT | 推进操作后分支指向的快照 ID |
+
+**示例:**
+
+```sql
+-- 将 feature 分支推进到 main 分支的最新快照
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE fast_forward ("branch" = "feature", "to" = "main");
+```
+
+**注意事项:**
+
+1. 该操作不支持 WHERE 条件
+2. 如果指定的分支不存在,操作会失败
+3. 只能推进分支,不能推进标签(Tag)
+
+### publish_changes
+
+`publish_changes` 操作是“写入 - 审核 - 发布” (Write-Audit-Publish)
模式中的“发布”步骤。该过程会查找带有特定 `wap.id` 属性的快照,并将其添加到当前表状态中。这样,用户就可以在验证后原子性地使“暂存”数据可见。
+
+> 支持版本:4.1.0+
+
+**语法:**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE publish_changes ("wap_id" = "<wap-id>")
+```
+
+**参数说明:**
+
+| 参数名称 | 类型 | 必填 | 描述 |
+| -------- | ---- | ---- | ---- |
+| `wap_id` | String | 是 | 要进行发布的快照的 `wap.id` |
+
+**返回值:**
+
+执行 `publish_changes` 操作会返回一个结果集,包含以下 2 列:
+
+| 列名 | 类型 | 描述 |
+| ---- | ---- | ---- |
+| `previous_snapshot_id` | BIGINT | 发布之前的快照 ID |
+| `current_snapshot_id` | BIGINT | 发布之后的新的快照 ID |
+
+**示例:**
+
+```sql
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE publish_changes ("wap_id" = "branch-123")
+```
+
+**注意事项:**
+
+1. 该操作不支持 WHERE 条件
+2. 如果指定的 `wap.id` 不存在,操作会失败
+
+
### rewrite_data_files
`rewrite_data_files` 操作用于对 Iceberg
表进行数据文件重写,实现小文件合并(Compaction)功能,以优化查询性能和存储效率。该操作是一个同步操作。
+> 支持版本:4.0.2+
+
**语法:**
```sql
@@ -2150,92 +2279,47 @@ EXECUTE rewrite_data_files (
3. WHERE 条件可用于限制重写的分区或数据范围,这个条件会过滤掉那些不包含 符合 WHERE 条件的数据的文件,从而减少重写的文件数量和数据量
4. 执行前可以通过[重写文件选择逻辑](#重写文件选择逻辑)章节中的 SQL 来计算哪些文件会被重写
-### cherrypick_snapshot
-
-`cherrypick_snapshot` 操作用于将指定快照的变更合并到当前表状态,创建一个新的快照而不修改或删除原始快照。
-
-**语法:**
-
-```sql
-ALTER TABLE [catalog.][database.]table_name
-EXECUTE cherrypick_snapshot ("snapshot_id" = "<snapshot_id>")
-```
-
-**参数说明:**
-
-| 参数名称 | 类型 | 必填 | 描述 |
-| -------- | ---- | ---- | ---- |
-| `snapshot_id` | Long | 是 | 要合并的快照 ID |
-
-**返回值:**
-
-执行 `cherrypick_snapshot` 操作会返回一个结果集,包含以下 2 列:
-
-| 列名 | 类型 | 描述 |
-| ---- | ---- | ---- |
-| `source_snapshot_id` | BIGINT | 被合并的快照 ID |
-| `current_snapshot_id` | BIGINT | 合并操作后新创建并设置为当前快照的快照 ID |
-
-**示例:**
-
-```sql
--- 将快照 123456789 的变更合并到当前表状态
-ALTER TABLE iceberg_db.iceberg_table
-EXECUTE cherrypick_snapshot ("snapshot_id" = "123456789");
-```
+### rewrite_manifests
-**注意事项:**
+`rewrite_manifests` 操作允许用户通过重写清单文件(Manifest File)来优化 Iceberg
表元数据,从而提高查询性能并减少元数据开销。
-1. 该操作不支持 WHERE 条件
-2. 如果指定的快照不存在,操作会失败
-3. 合并操作会创建一个新的快照,不会删除原始快照
-
-### fast_forward
+这解决了大型 Iceberg 表中清单文件优化的问题,因为大量小型清单文件可能会影响查询计划性能。
-`fast_forward` 操作用于将一个分支的当前快照快速推进到另一个分支的最新快照。
+> 支持版本:4.1.0+
**语法:**
```sql
ALTER TABLE [catalog.][database.]table_name
-EXECUTE fast_forward ("branch" = "<branch_name>", "to" = "<target_branch>")
+EXECUTE rewrite_manifests();
```
**参数说明:**
-| 参数名称 | 类型 | 必填 | 描述 |
-| -------- | ---- | ---- | ---- |
-| `branch` | String | 是 | 要推进的分支名称 |
-| `to` | String | 是 | 目标分支名称,会将 `branch` 推进到该分支的最新快照 |
+无参数
**返回值:**
-执行 `fast_forward` 操作会返回一个结果集,包含以下 3 列:
+执行 `rewrite_manifests` 操作会返回一个结果集,包含以下 2 列:
| 列名 | 类型 | 描述 |
| ---- | ---- | ---- |
-| `branch_updated` | STRING | 被推进的分支名称 |
-| `previous_ref` | BIGINT | 推进操作前分支指向的快照 ID(可为 NULL) |
-| `updated_ref` | BIGINT | 推进操作后分支指向的快照 ID |
+| `rewritten_manifests_count` | BIGINT | 被重写的文件数量 |
+| `total_data_manifests_count` | BIGINT | 被重写的数据清单文件数量 |
**示例:**
```sql
--- 将 feature 分支推进到 main 分支的最新快照
-ALTER TABLE iceberg_db.iceberg_table
-EXECUTE fast_forward ("branch" = "feature", "to" = "main");
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE rewrite_manifests();
```
-**注意事项:**
-
-1. 该操作不支持 WHERE 条件
-2. 如果指定的分支不存在,操作会失败
-3. 只能推进分支,不能推进标签(Tag)
-
### rollback_to_snapshot
`rollback_to_snapshot` 操作用于将 Iceberg 表回滚到指定的快照。
+> 支持版本:4.0.2+
+
**语法:**
```sql
@@ -2276,6 +2360,8 @@ EXECUTE rollback_to_snapshot ("snapshot_id" =
"987654321");
`rollback_to_timestamp` 操作用于将 Iceberg 表回滚到指定时间点的快照。
+> 支持版本:4.0.2+
+
**语法:**
```sql
@@ -2320,6 +2406,8 @@ EXECUTE rollback_to_timestamp ("timestamp" =
"1704067200000");
`set_current_snapshot` 操作用于将 Iceberg 表的当前快照设置为指定的快照 ID 或引用(分支或标签)。
+> 支持版本:4.0.2+
+
**语法:**
```sql
diff --git a/versioned_docs/version-3.x/lakehouse/catalogs/iceberg-catalog.mdx
b/versioned_docs/version-3.x/lakehouse/catalogs/iceberg-catalog.mdx
index 5afe8b5a49f..3a74490f01c 100644
--- a/versioned_docs/version-3.x/lakehouse/catalogs/iceberg-catalog.mdx
+++ b/versioned_docs/version-3.x/lakehouse/catalogs/iceberg-catalog.mdx
@@ -2049,11 +2049,11 @@ Supported operations include:
## Iceberg Table Actions
-> This feature is supported since 4.0.2
+> This feature is supported starting from version 4.0.2. See specific version
requirements for each operation.
>
> This is an experimental feature.
-Doris supports executing specific actions on Iceberg tables through the `ALTER
TABLE EXECUTE` syntax. This syntax provides a unified interface for executing
various actions such as table optimization and snapshot management.
+Doris supports executing specific Iceberg table actions through the `ALTER
TABLE EXECUTE` syntax. This syntax provides a unified interface for executing
various operations such as table optimization and snapshot management.
```sql
ALTER TABLE [catalog.][database.]table_name
@@ -2061,167 +2061,251 @@ EXECUTE <action_name> ("key1" = "value1", "key2" =
"value2", ...)
[WHERE <condition>]
```
-* `action_name`: The name of the action to execute
-* `("key1" = "value1", ...)`: Action-related parameters provided as key-value
pairs. Parameter names must be wrapped in double quotes
+* `action_name`: Name of the operation to execute
+* `("key1" = "value1", ...)`: Operation-related parameters provided as
key-value pairs. Parameter names must be enclosed in double quotes
* `[WHERE <condition>]`: Optional WHERE condition to specify the partition or
data range for the operation (only supported by some operations)
-### rewrite_data_files
+### cherrypick_snapshot
+
+The `cherrypick_snapshot` operation merges changes from a specified snapshot
into the current table state, creating a new snapshot without modifying or
deleting the original snapshot.
-The `rewrite_data_files` operation rewrites data files in an Iceberg table to
implement small file compaction, optimizing query performance and storage
efficiency. This is a synchronized operation.
+> Supported version: 4.0.2+
**Syntax:**
```sql
ALTER TABLE [catalog.][database.]table_name
-EXECUTE rewrite_data_files ("key1" = "value1", "key2" = "value2", ...) [WHERE
<condition>]
+EXECUTE cherrypick_snapshot ("snapshot_id" = "<snapshot_id>")
```
**Parameters:**
-| Parameter Name | Type | Default Value | Description |
-| -------------- | ---- | ------------- | ----------- |
-| `target-file-size-bytes` | Long | 536870912 (512MB) | Target file size in
bytes for output files |
-| `min-file-size-bytes` | Long | target-file-size-bytes * 0.75 | Minimum file
size in bytes. Files smaller than this will be rewritten. If not specified,
defaults to 75% of target file size |
-| `max-file-size-bytes` | Long | target-file-size-bytes * 1.8 | Maximum file
size in bytes. Files larger than this will be split. If not specified, defaults
to 180% of target file size |
-| `min-input-files` | Integer | 5 | Minimum number of input files to trigger
rewrite. Rewrite only executes when the number of files to rewrite reaches this
threshold |
-| `rewrite-all` | Boolean | false | Whether to rewrite all files regardless of
file size |
-| `max-file-group-size-bytes` | Long | 107374182400 (100GB) | Maximum size in
bytes for a file group. Used to limit the amount of data processed by a single
rewrite task |
-| `delete-file-threshold` | Integer | Integer.MAX_VALUE | Minimum number of
delete files required to trigger rewrite |
-| `delete-ratio-threshold` | Double | 0.3 | Minimum ratio of delete records to
total records required to trigger rewrite (delete records / total records).
Range: 0.0 - 1.0 |
-| `output-spec-id` | Long | 2 | Partition specification ID for output files |
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
+| `snapshot_id` | Long | Yes | The snapshot ID to be merged |
**Return Value:**
-Executing `rewrite_data_files` returns a result set with the following 4
columns:
+Executing the `cherrypick_snapshot` operation returns a result set with the
following 2 columns:
| Column Name | Type | Description |
-| ----------- | ---- | ----------- |
-| `rewritten_data_files_count` | INT | Number of data files that were
rewritten |
-| `added_data_files_count` | INT | Number of new data files that were written |
-| `rewritten_bytes_count` | INT | Number of bytes that were rewritten |
-| `removed_delete_files_count` | BIGINT | Number of delete files that were
removed |
+| ---- | ---- | ---- |
+| `source_snapshot_id` | BIGINT | The merged snapshot ID |
+| `current_snapshot_id` | BIGINT | The snapshot ID newly created and set as
current after the merge operation |
-**Examples:**
+**Example:**
```sql
--- Rewrite data files with default parameters
-ALTER TABLE iceberg_db.iceberg_table EXECUTE rewrite_data_files();
-
--- Specify target file size and minimum input files
+-- Merge changes from snapshot 123456789 into the current table state
ALTER TABLE iceberg_db.iceberg_table
-EXECUTE rewrite_data_files (
- "target-file-size-bytes" = "134217728",
- "min-input-files" = "10"
-);
+EXECUTE cherrypick_snapshot ("snapshot_id" = "123456789");
+```
--- Use WHERE condition to rewrite specific partition data
-ALTER TABLE iceberg_db.iceberg_table
-EXECUTE rewrite_data_files (
- "target-file-size-bytes" = "268435456"
-)
-WHERE date_col = '2024-01-01';
+**Notes:**
--- Rewrite all files
+1. This operation does not support WHERE conditions
+2. The operation will fail if the specified snapshot does not exist
+3. The merge operation creates a new snapshot and does not delete the original
snapshot
+
+### fast_forward
+
+The `fast_forward` operation quickly advances the current snapshot of one
branch to the latest snapshot of another branch.
+
+> Supported version: 4.0.2+
+
+**Syntax:**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE fast_forward ("branch" = "<branch_name>", "to" = "<target_branch>")
+```
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
+| `branch` | String | Yes | Name of the branch to advance |
+| `to` | String | Yes | Target branch name; `branch` will be advanced to the
latest snapshot of this branch |
+
+**Return Value:**
+
+Executing the `fast_forward` operation returns a result set with the following
3 columns:
+
+| Column Name | Type | Description |
+| ---- | ---- | ---- |
+| `branch_updated` | STRING | The name of the branch that was advanced |
+| `previous_ref` | BIGINT | The snapshot ID the branch pointed to before
advancing (can be NULL) |
+| `updated_ref` | BIGINT | The snapshot ID the branch points to after
advancing |
+
+**Example:**
+
+```sql
+-- Advance the feature branch to the latest snapshot of the main branch
ALTER TABLE iceberg_db.iceberg_table
-EXECUTE rewrite_data_files (
- "rewrite-all" = "true",
- "target-file-size-bytes" = "536870912"
-);
+EXECUTE fast_forward ("branch" = "feature", "to" = "main");
```
**Notes:**
-1. The `rewrite_data_files` operation reads and rewrites data files, which
will generate additional I/O and computational overhead. Please allocate
cluster resources reasonably.
-2. Before execution, you can use the SQL in the [View Data File
Distribution](#view-data-file-distribution) section to evaluate whether a
rewrite operation is needed
-3. The WHERE condition can be used to limit the partition or data range for
rewriting. This condition will filter out files that do not contain data
matching the WHERE condition, thereby reducing the number of files and amount
of data to rewrite
-4. Before execution, you can use the SQL in the [Rewrite File Selection
Logic](#rewrite-file-selection-logic) section to calculate which files will be
rewritten
+1. This operation does not support WHERE conditions
+2. The operation will fail if the specified branch does not exist
+3. Only branches can be advanced, not tags
-### cherrypick_snapshot
+### publish_changes
-The `cherrypick_snapshot` operation merges changes from a specified snapshot
into the current table state, creating a new snapshot without modifying or
deleting the original snapshot.
+The `publish_changes` operation is the "Publish" step in the
Write-Audit-Publish (WAP) pattern. This process finds snapshots with a specific
`wap.id` attribute and adds them to the current table state. This allows users
to atomically make "staged" data visible after validation.
+
+> Supported version: 4.1.0+
**Syntax:**
```sql
ALTER TABLE [catalog.][database.]table_name
-EXECUTE cherrypick_snapshot ("snapshot_id" = "<snapshot_id>")
+EXECUTE publish_changes ("wap_id" = "<wap-id>")
```
**Parameters:**
-| Parameter Name | Type | Required | Description |
-| -------------- | ---- | -------- | ----------- |
-| `snapshot_id` | Long | Yes | The snapshot ID to merge |
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
+| `wap_id` | String | Yes | The `wap.id` of the snapshot to publish |
**Return Value:**
-Executing `cherrypick_snapshot` returns a result set with the following 2
columns:
+Executing the `publish_changes` operation returns a result set with the
following 2 columns:
| Column Name | Type | Description |
-| ----------- | ---- | ----------- |
-| `source_snapshot_id` | BIGINT | The snapshot ID whose changes were merged |
-| `current_snapshot_id` | BIGINT | The snapshot ID of the new snapshot created
and set as current after the merge operation |
+| ---- | ---- | ---- |
+| `previous_snapshot_id` | BIGINT | The snapshot ID before publishing |
+| `current_snapshot_id` | BIGINT | The new snapshot ID after publishing |
**Example:**
```sql
--- Merge changes from snapshot 123456789 into current table state
-ALTER TABLE iceberg_db.iceberg_table
-EXECUTE cherrypick_snapshot ("snapshot_id" = "123456789");
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE publish_changes ("wap_id" = "branch-123")
```
**Notes:**
1. This operation does not support WHERE conditions
-2. If the specified snapshot does not exist, the operation will fail
-3. The merge operation creates a new snapshot and does not delete the original
snapshot
+2. The operation will fail if the specified `wap.id` does not exist
-### fast_forward
-The `fast_forward` operation fast-forwards the current snapshot of one branch
to the latest snapshot of another branch.
+### rewrite_data_files
+
+The `rewrite_data_files` operation rewrites data files in an Iceberg table to
achieve small file compaction, optimizing query performance and storage
efficiency. This is a synchronous operation.
+
+> Supported version: 4.0.2+
**Syntax:**
```sql
ALTER TABLE [catalog.][database.]table_name
-EXECUTE fast_forward ("branch" = "<branch_name>", "to" = "<target_branch>")
+EXECUTE rewrite_data_files ("key1" = "value1", "key2" = "value2", ...) [WHERE
<condition>]
```
**Parameters:**
-| Parameter Name | Type | Required | Description |
-| -------------- | ---- | -------- | ----------- |
-| `branch` | String | Yes | Name of the branch to fast-forward |
-| `to` | String | Yes | Target branch name. The `branch` will be
fast-forwarded to the latest snapshot of this branch |
+| Parameter | Type | Default Value | Description |
+| -------- | ---- | ------ | ---- |
+| `target-file-size-bytes` | Long | 536870912 (512MB) | Target file size in
bytes. The target size for output files |
+| `min-file-size-bytes` | Long | target-file-size-bytes * 0.75 | Minimum file
size in bytes. Files smaller than this will be rewritten. If not specified,
defaults to 75% of target file size |
+| `max-file-size-bytes` | Long | target-file-size-bytes * 1.8 | Maximum file
size in bytes. Files larger than this will be split. If not specified, defaults
to 180% of target file size |
+| `min-input-files` | Integer | 5 | Minimum number of input files to trigger
rewriting. Rewriting will only execute when the number of files to rewrite
reaches this threshold |
+| `rewrite-all` | Boolean | false | Whether to rewrite all files regardless of
file size |
+| `max-file-group-size-bytes` | Long | 107374182400 (100GB) | Maximum size of
a single file group in bytes. Used to limit the amount of data processed by a
single rewrite task |
+| `delete-file-threshold` | Integer | Integer.MAX_VALUE | Minimum number of
delete files required to trigger rewriting |
+| `delete-ratio-threshold` | Double | 0.3 | Minimum ratio of deleted records
required to trigger rewriting (deleted records / total records). Range: 0.0 -
1.0 |
+| `output-spec-id` | Long | 2 | Partition spec ID for output files |
**Return Value:**
-Executing `fast_forward` returns a result set with the following 3 columns:
+Executing the `rewrite_data_files` operation returns a result set with the
following 4 columns:
| Column Name | Type | Description |
-| ----------- | ---- | ----------- |
-| `branch_updated` | STRING | Name of the branch that was fast-forwarded |
-| `previous_ref` | BIGINT | Snapshot ID the branch was pointing to before the
fast-forward operation (can be NULL) |
-| `updated_ref` | BIGINT | Snapshot ID the branch is pointing to after the
fast-forward operation |
+| ---- | ---- | ---- |
+| `rewritten_data_files_count` | INT | Number of data files rewritten |
+| `added_data_files_count` | INT | Number of new data files added |
+| `rewritten_bytes_count` | INT | Number of bytes rewritten |
+| `removed_delete_files_count` | BIGINT | Number of delete files removed |
-**Example:**
+**Examples:**
```sql
--- Fast-forward feature branch to the latest snapshot of main branch
+-- Rewrite data files with default parameters
+ALTER TABLE iceberg_db.iceberg_table EXECUTE rewrite_data_files();
+
+-- Specify target file size and minimum input files
ALTER TABLE iceberg_db.iceberg_table
-EXECUTE fast_forward ("branch" = "feature", "to" = "main");
+EXECUTE rewrite_data_files (
+ "target-file-size-bytes" = "134217728",
+ "min-input-files" = "10"
+);
+
+-- Rewrite data in specific partitions using WHERE condition
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE rewrite_data_files (
+ "target-file-size-bytes" = "268435456"
+)
+WHERE date_col = '2024-01-01';
+
+-- Rewrite all files
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE rewrite_data_files (
+ "rewrite-all" = "true",
+ "target-file-size-bytes" = "536870912"
+);
```
**Notes:**
-1. This operation does not support WHERE conditions
-2. If the specified branch does not exist, the operation will fail
-3. Only branches can be fast-forwarded, not tags
+1. The `rewrite_data_files` operation reads and rewrites data files, which
incurs additional I/O and computing overhead. Please allocate cluster resources
appropriately.
+2. Before execution, you can use SQL from the [View Data File
Distribution](#view-data-file-distribution) section to evaluate whether
rewriting is necessary
+3. WHERE conditions can be used to limit the partitions or data range for
rewriting. This condition filters out files that don't contain data matching
the WHERE condition, reducing the number of files and amount of data to rewrite
+4. Before execution, you can use SQL from the [Rewrite File Selection
Logic](#rewrite-file-selection-logic) section to calculate which files will be
rewritten
+
+### rewrite_manifests
+
+The `rewrite_manifests` operation allows users to optimize Iceberg table
metadata by rewriting manifest files, improving query performance and reducing
metadata overhead.
+
+This addresses manifest file optimization issues in large Iceberg tables, as a
large number of small manifest files can impact query planning performance.
+
+> Supported version: 4.1.0+
+
+**Syntax:**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE rewrite_manifests();
+```
+
+**Parameters:**
+
+No parameters
+
+**Return Value:**
+
+Executing the `rewrite_manifests` operation returns a result set with the
following 2 columns:
+
+| Column Name | Type | Description |
+| ---- | ---- | ---- |
+| `rewritten_manifests_count` | BIGINT | Number of files rewritten |
+| `total_data_manifests_count` | BIGINT | Number of data manifest files
rewritten |
+
+**Example:**
+
+```sql
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE rewrite_manifests();
+```
### rollback_to_snapshot
The `rollback_to_snapshot` operation rolls back an Iceberg table to a
specified snapshot.
+> Supported version: 4.0.2+
+
**Syntax:**
```sql
@@ -2231,23 +2315,23 @@ EXECUTE rollback_to_snapshot ("snapshot_id" =
"<snapshot_id>")
**Parameters:**
-| Parameter Name | Type | Required | Description |
-| -------------- | ---- | -------- | ----------- |
-| `snapshot_id` | Long | Yes | The snapshot ID to rollback to |
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
+| `snapshot_id` | Long | Yes | The snapshot ID to roll back to |
**Return Value:**
-Executing `rollback_to_snapshot` returns a result set with the following 2
columns:
+Executing the `rollback_to_snapshot` operation returns a result set with the
following 2 columns:
| Column Name | Type | Description |
-| ----------- | ---- | ----------- |
-| `previous_snapshot_id` | BIGINT | The snapshot ID that was current before
the rollback operation |
-| `current_snapshot_id` | BIGINT | The snapshot ID that is set as current
after rolling back to the specified snapshot |
+| ---- | ---- | ---- |
+| `previous_snapshot_id` | BIGINT | The ID of the current snapshot before
rollback |
+| `current_snapshot_id` | BIGINT | The snapshot ID set as current after
rollback |
**Example:**
```sql
--- Rollback to snapshot 987654321
+-- Roll back to snapshot 987654321
ALTER TABLE iceberg_db.iceberg_table
EXECUTE rollback_to_snapshot ("snapshot_id" = "987654321");
```
@@ -2255,12 +2339,14 @@ EXECUTE rollback_to_snapshot ("snapshot_id" =
"987654321");
**Notes:**
1. This operation does not support WHERE conditions
-2. If the specified snapshot does not exist, the operation will fail
+2. The operation will fail if the specified snapshot does not exist
3. If the current snapshot is already the target snapshot, the operation
returns directly without creating a new snapshot
### rollback_to_timestamp
-The `rollback_to_timestamp` operation rolls back an Iceberg table to the
snapshot at a specified timestamp.
+The `rollback_to_timestamp` operation rolls back an Iceberg table to the
snapshot at a specified point in time.
+
+> Supported version: 4.0.2+
**Syntax:**
@@ -2271,27 +2357,27 @@ EXECUTE rollback_to_timestamp ("timestamp" =
"<timestamp>")
**Parameters:**
-| Parameter Name | Type | Required | Description |
-| -------------- | ---- | -------- | ----------- |
-| `timestamp` | String | Yes | The timestamp to rollback to. Supports two
formats:<br />1. ISO datetime format: `yyyy-MM-dd HH:mm:ss.SSS` (e.g.,
`2024-01-01 10:30:00.000`)<br />2. Milliseconds timestamp (e.g.,
`1704067200000`) |
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
+| `timestamp` | String | Yes | The point in time to roll back to. Supports two
formats:<br />1. ISO datetime format: `yyyy-MM-dd HH:mm:ss.SSS` (e.g.,
`2024-01-01 10:30:00.000`)<br/>2. Millisecond timestamp (e.g., `1704067200000`)
|
**Return Value:**
-Executing `rollback_to_timestamp` returns a result set with the following 2
columns:
+Executing the `rollback_to_timestamp` operation returns a result set with the
following 2 columns:
| Column Name | Type | Description |
-| ----------- | ---- | ----------- |
-| `previous_snapshot_id` | BIGINT | The snapshot ID that was current before
the rollback operation |
-| `current_snapshot_id` | BIGINT | The snapshot ID that was current at the
specified timestamp, now set as current |
+| ---- | ---- | ---- |
+| `previous_snapshot_id` | BIGINT | The ID of the current snapshot before
rollback |
+| `current_snapshot_id` | BIGINT | The snapshot ID at the specified point in
time, now set as current |
**Examples:**
```sql
--- Rollback to specified timestamp (ISO format)
+-- Roll back to specified point in time (ISO format)
ALTER TABLE iceberg_db.iceberg_table
EXECUTE rollback_to_timestamp ("timestamp" = "2024-01-01 10:30:00.000");
--- Rollback to specified timestamp (timestamp format)
+-- Roll back to specified point in time (timestamp format)
ALTER TABLE iceberg_db.iceberg_table
EXECUTE rollback_to_timestamp ("timestamp" = "1704067200000");
```
@@ -2299,13 +2385,15 @@ EXECUTE rollback_to_timestamp ("timestamp" =
"1704067200000");
**Notes:**
1. This operation does not support WHERE conditions
-2. The `timestamp` parameter supports ISO datetime format (`yyyy-MM-dd
HH:mm:ss.SSS`) or milliseconds timestamp format
-3. If there is no snapshot corresponding to the specified timestamp, the
operation will fail
+2. The `timestamp` parameter supports ISO datetime format (`yyyy-MM-dd
HH:mm:ss.SSS`) or millisecond timestamp format
+3. The operation will fail if there is no corresponding snapshot at the
specified point in time
### set_current_snapshot
The `set_current_snapshot` operation sets the current snapshot of an Iceberg
table to a specified snapshot ID or reference (branch or tag).
+> Supported version: 4.0.2+
+
**Syntax:**
```sql
@@ -2315,19 +2403,19 @@ EXECUTE set_current_snapshot ("snapshot_id" =
"<snapshot_id>" | "ref" = "<ref_na
**Parameters:**
-| Parameter Name | Type | Required | Description |
-| -------------- | ---- | -------- | ----------- |
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
| `snapshot_id` | Long | Either one | The snapshot ID to set as current |
| `ref` | String | Either one | The reference name (branch or tag) to set as
current |
**Return Value:**
-Executing `set_current_snapshot` returns a result set with the following 2
columns:
+Executing the `set_current_snapshot` operation returns a result set with the
following 2 columns:
| Column Name | Type | Description |
-| ----------- | ---- | ----------- |
-| `previous_snapshot_id` | BIGINT | The snapshot ID that was current before
setting the new current snapshot |
-| `current_snapshot_id` | BIGINT | The snapshot ID that is now set as current
(from `snapshot_id` parameter or resolved from `ref` parameter) |
+| ---- | ---- | ---- |
+| `previous_snapshot_id` | BIGINT | The ID of the current snapshot before the
operation |
+| `current_snapshot_id` | BIGINT | The snapshot ID after the operation (from
`snapshot_id` parameter or resolved from `ref` parameter) |
**Examples:**
@@ -2349,7 +2437,7 @@ EXECUTE set_current_snapshot ("ref" = "v1.0");
1. This operation does not support WHERE conditions
2. The `snapshot_id` and `ref` parameters are mutually exclusive; only one can
be specified
-3. If the specified snapshot ID or reference does not exist, the operation
will fail
+3. The operation will fail if the specified snapshot ID or reference does not
exist
4. If the current snapshot is already the target snapshot, the operation
returns directly without creating a new snapshot
## Iceberg Table Optimization
diff --git a/versioned_docs/version-4.x/lakehouse/catalogs/iceberg-catalog.mdx
b/versioned_docs/version-4.x/lakehouse/catalogs/iceberg-catalog.mdx
index 5afe8b5a49f..3a74490f01c 100644
--- a/versioned_docs/version-4.x/lakehouse/catalogs/iceberg-catalog.mdx
+++ b/versioned_docs/version-4.x/lakehouse/catalogs/iceberg-catalog.mdx
@@ -2049,11 +2049,11 @@ Supported operations include:
## Iceberg Table Actions
-> This feature is supported since 4.0.2
+> This feature is supported starting from version 4.0.2. See specific version
requirements for each operation.
>
> This is an experimental feature.
-Doris supports executing specific actions on Iceberg tables through the `ALTER
TABLE EXECUTE` syntax. This syntax provides a unified interface for executing
various actions such as table optimization and snapshot management.
+Doris supports executing specific Iceberg table actions through the `ALTER
TABLE EXECUTE` syntax. This syntax provides a unified interface for executing
various operations such as table optimization and snapshot management.
```sql
ALTER TABLE [catalog.][database.]table_name
@@ -2061,167 +2061,251 @@ EXECUTE <action_name> ("key1" = "value1", "key2" =
"value2", ...)
[WHERE <condition>]
```
-* `action_name`: The name of the action to execute
-* `("key1" = "value1", ...)`: Action-related parameters provided as key-value
pairs. Parameter names must be wrapped in double quotes
+* `action_name`: Name of the operation to execute
+* `("key1" = "value1", ...)`: Operation-related parameters provided as
key-value pairs. Parameter names must be enclosed in double quotes
* `[WHERE <condition>]`: Optional WHERE condition to specify the partition or
data range for the operation (only supported by some operations)
-### rewrite_data_files
+### cherrypick_snapshot
+
+The `cherrypick_snapshot` operation merges changes from a specified snapshot
into the current table state, creating a new snapshot without modifying or
deleting the original snapshot.
-The `rewrite_data_files` operation rewrites data files in an Iceberg table to
implement small file compaction, optimizing query performance and storage
efficiency. This is a synchronized operation.
+> Supported version: 4.0.2+
**Syntax:**
```sql
ALTER TABLE [catalog.][database.]table_name
-EXECUTE rewrite_data_files ("key1" = "value1", "key2" = "value2", ...) [WHERE
<condition>]
+EXECUTE cherrypick_snapshot ("snapshot_id" = "<snapshot_id>")
```
**Parameters:**
-| Parameter Name | Type | Default Value | Description |
-| -------------- | ---- | ------------- | ----------- |
-| `target-file-size-bytes` | Long | 536870912 (512MB) | Target file size in
bytes for output files |
-| `min-file-size-bytes` | Long | target-file-size-bytes * 0.75 | Minimum file
size in bytes. Files smaller than this will be rewritten. If not specified,
defaults to 75% of target file size |
-| `max-file-size-bytes` | Long | target-file-size-bytes * 1.8 | Maximum file
size in bytes. Files larger than this will be split. If not specified, defaults
to 180% of target file size |
-| `min-input-files` | Integer | 5 | Minimum number of input files to trigger
rewrite. Rewrite only executes when the number of files to rewrite reaches this
threshold |
-| `rewrite-all` | Boolean | false | Whether to rewrite all files regardless of
file size |
-| `max-file-group-size-bytes` | Long | 107374182400 (100GB) | Maximum size in
bytes for a file group. Used to limit the amount of data processed by a single
rewrite task |
-| `delete-file-threshold` | Integer | Integer.MAX_VALUE | Minimum number of
delete files required to trigger rewrite |
-| `delete-ratio-threshold` | Double | 0.3 | Minimum ratio of delete records to
total records required to trigger rewrite (delete records / total records).
Range: 0.0 - 1.0 |
-| `output-spec-id` | Long | 2 | Partition specification ID for output files |
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
+| `snapshot_id` | Long | Yes | The snapshot ID to be merged |
**Return Value:**
-Executing `rewrite_data_files` returns a result set with the following 4
columns:
+Executing the `cherrypick_snapshot` operation returns a result set with the
following 2 columns:
| Column Name | Type | Description |
-| ----------- | ---- | ----------- |
-| `rewritten_data_files_count` | INT | Number of data files that were
rewritten |
-| `added_data_files_count` | INT | Number of new data files that were written |
-| `rewritten_bytes_count` | INT | Number of bytes that were rewritten |
-| `removed_delete_files_count` | BIGINT | Number of delete files that were
removed |
+| ---- | ---- | ---- |
+| `source_snapshot_id` | BIGINT | The merged snapshot ID |
+| `current_snapshot_id` | BIGINT | The snapshot ID newly created and set as
current after the merge operation |
-**Examples:**
+**Example:**
```sql
--- Rewrite data files with default parameters
-ALTER TABLE iceberg_db.iceberg_table EXECUTE rewrite_data_files();
-
--- Specify target file size and minimum input files
+-- Merge changes from snapshot 123456789 into the current table state
ALTER TABLE iceberg_db.iceberg_table
-EXECUTE rewrite_data_files (
- "target-file-size-bytes" = "134217728",
- "min-input-files" = "10"
-);
+EXECUTE cherrypick_snapshot ("snapshot_id" = "123456789");
+```
--- Use WHERE condition to rewrite specific partition data
-ALTER TABLE iceberg_db.iceberg_table
-EXECUTE rewrite_data_files (
- "target-file-size-bytes" = "268435456"
-)
-WHERE date_col = '2024-01-01';
+**Notes:**
--- Rewrite all files
+1. This operation does not support WHERE conditions
+2. The operation will fail if the specified snapshot does not exist
+3. The merge operation creates a new snapshot and does not delete the original
snapshot
+
+### fast_forward
+
+The `fast_forward` operation quickly advances the current snapshot of one
branch to the latest snapshot of another branch.
+
+> Supported version: 4.0.2+
+
+**Syntax:**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE fast_forward ("branch" = "<branch_name>", "to" = "<target_branch>")
+```
+
+**Parameters:**
+
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
+| `branch` | String | Yes | Name of the branch to advance |
+| `to` | String | Yes | Target branch name; `branch` will be advanced to the
latest snapshot of this branch |
+
+**Return Value:**
+
+Executing the `fast_forward` operation returns a result set with the following
3 columns:
+
+| Column Name | Type | Description |
+| ---- | ---- | ---- |
+| `branch_updated` | STRING | The name of the branch that was advanced |
+| `previous_ref` | BIGINT | The snapshot ID the branch pointed to before
advancing (can be NULL) |
+| `updated_ref` | BIGINT | The snapshot ID the branch points to after
advancing |
+
+**Example:**
+
+```sql
+-- Advance the feature branch to the latest snapshot of the main branch
ALTER TABLE iceberg_db.iceberg_table
-EXECUTE rewrite_data_files (
- "rewrite-all" = "true",
- "target-file-size-bytes" = "536870912"
-);
+EXECUTE fast_forward ("branch" = "feature", "to" = "main");
```
**Notes:**
-1. The `rewrite_data_files` operation reads and rewrites data files, which
will generate additional I/O and computational overhead. Please allocate
cluster resources reasonably.
-2. Before execution, you can use the SQL in the [View Data File
Distribution](#view-data-file-distribution) section to evaluate whether a
rewrite operation is needed
-3. The WHERE condition can be used to limit the partition or data range for
rewriting. This condition will filter out files that do not contain data
matching the WHERE condition, thereby reducing the number of files and amount
of data to rewrite
-4. Before execution, you can use the SQL in the [Rewrite File Selection
Logic](#rewrite-file-selection-logic) section to calculate which files will be
rewritten
+1. This operation does not support WHERE conditions
+2. The operation will fail if the specified branch does not exist
+3. Only branches can be advanced, not tags
-### cherrypick_snapshot
+### publish_changes
-The `cherrypick_snapshot` operation merges changes from a specified snapshot
into the current table state, creating a new snapshot without modifying or
deleting the original snapshot.
+The `publish_changes` operation is the "Publish" step in the
Write-Audit-Publish (WAP) pattern. This process finds snapshots with a specific
`wap.id` attribute and adds them to the current table state. This allows users
to atomically make "staged" data visible after validation.
+
+> Supported version: 4.1.0+
**Syntax:**
```sql
ALTER TABLE [catalog.][database.]table_name
-EXECUTE cherrypick_snapshot ("snapshot_id" = "<snapshot_id>")
+EXECUTE publish_changes ("wap_id" = "<wap-id>")
```
**Parameters:**
-| Parameter Name | Type | Required | Description |
-| -------------- | ---- | -------- | ----------- |
-| `snapshot_id` | Long | Yes | The snapshot ID to merge |
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
+| `wap_id` | String | Yes | The `wap.id` of the snapshot to publish |
**Return Value:**
-Executing `cherrypick_snapshot` returns a result set with the following 2
columns:
+Executing the `publish_changes` operation returns a result set with the
following 2 columns:
| Column Name | Type | Description |
-| ----------- | ---- | ----------- |
-| `source_snapshot_id` | BIGINT | The snapshot ID whose changes were merged |
-| `current_snapshot_id` | BIGINT | The snapshot ID of the new snapshot created
and set as current after the merge operation |
+| ---- | ---- | ---- |
+| `previous_snapshot_id` | BIGINT | The snapshot ID before publishing |
+| `current_snapshot_id` | BIGINT | The new snapshot ID after publishing |
**Example:**
```sql
--- Merge changes from snapshot 123456789 into current table state
-ALTER TABLE iceberg_db.iceberg_table
-EXECUTE cherrypick_snapshot ("snapshot_id" = "123456789");
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE publish_changes ("wap_id" = "branch-123")
```
**Notes:**
1. This operation does not support WHERE conditions
-2. If the specified snapshot does not exist, the operation will fail
-3. The merge operation creates a new snapshot and does not delete the original
snapshot
+2. The operation will fail if the specified `wap.id` does not exist
-### fast_forward
-The `fast_forward` operation fast-forwards the current snapshot of one branch
to the latest snapshot of another branch.
+### rewrite_data_files
+
+The `rewrite_data_files` operation rewrites data files in an Iceberg table to
achieve small file compaction, optimizing query performance and storage
efficiency. This is a synchronous operation.
+
+> Supported version: 4.0.2+
**Syntax:**
```sql
ALTER TABLE [catalog.][database.]table_name
-EXECUTE fast_forward ("branch" = "<branch_name>", "to" = "<target_branch>")
+EXECUTE rewrite_data_files ("key1" = "value1", "key2" = "value2", ...) [WHERE
<condition>]
```
**Parameters:**
-| Parameter Name | Type | Required | Description |
-| -------------- | ---- | -------- | ----------- |
-| `branch` | String | Yes | Name of the branch to fast-forward |
-| `to` | String | Yes | Target branch name. The `branch` will be
fast-forwarded to the latest snapshot of this branch |
+| Parameter | Type | Default Value | Description |
+| -------- | ---- | ------ | ---- |
+| `target-file-size-bytes` | Long | 536870912 (512MB) | Target file size in
bytes. The target size for output files |
+| `min-file-size-bytes` | Long | target-file-size-bytes * 0.75 | Minimum file
size in bytes. Files smaller than this will be rewritten. If not specified,
defaults to 75% of target file size |
+| `max-file-size-bytes` | Long | target-file-size-bytes * 1.8 | Maximum file
size in bytes. Files larger than this will be split. If not specified, defaults
to 180% of target file size |
+| `min-input-files` | Integer | 5 | Minimum number of input files to trigger
rewriting. Rewriting will only execute when the number of files to rewrite
reaches this threshold |
+| `rewrite-all` | Boolean | false | Whether to rewrite all files regardless of
file size |
+| `max-file-group-size-bytes` | Long | 107374182400 (100GB) | Maximum size of
a single file group in bytes. Used to limit the amount of data processed by a
single rewrite task |
+| `delete-file-threshold` | Integer | Integer.MAX_VALUE | Minimum number of
delete files required to trigger rewriting |
+| `delete-ratio-threshold` | Double | 0.3 | Minimum ratio of deleted records
required to trigger rewriting (deleted records / total records). Range: 0.0 -
1.0 |
+| `output-spec-id` | Long | 2 | Partition spec ID for output files |
**Return Value:**
-Executing `fast_forward` returns a result set with the following 3 columns:
+Executing the `rewrite_data_files` operation returns a result set with the
following 4 columns:
| Column Name | Type | Description |
-| ----------- | ---- | ----------- |
-| `branch_updated` | STRING | Name of the branch that was fast-forwarded |
-| `previous_ref` | BIGINT | Snapshot ID the branch was pointing to before the
fast-forward operation (can be NULL) |
-| `updated_ref` | BIGINT | Snapshot ID the branch is pointing to after the
fast-forward operation |
+| ---- | ---- | ---- |
+| `rewritten_data_files_count` | INT | Number of data files rewritten |
+| `added_data_files_count` | INT | Number of new data files added |
+| `rewritten_bytes_count` | INT | Number of bytes rewritten |
+| `removed_delete_files_count` | BIGINT | Number of delete files removed |
-**Example:**
+**Examples:**
```sql
--- Fast-forward feature branch to the latest snapshot of main branch
+-- Rewrite data files with default parameters
+ALTER TABLE iceberg_db.iceberg_table EXECUTE rewrite_data_files();
+
+-- Specify target file size and minimum input files
ALTER TABLE iceberg_db.iceberg_table
-EXECUTE fast_forward ("branch" = "feature", "to" = "main");
+EXECUTE rewrite_data_files (
+ "target-file-size-bytes" = "134217728",
+ "min-input-files" = "10"
+);
+
+-- Rewrite data in specific partitions using WHERE condition
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE rewrite_data_files (
+ "target-file-size-bytes" = "268435456"
+)
+WHERE date_col = '2024-01-01';
+
+-- Rewrite all files
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE rewrite_data_files (
+ "rewrite-all" = "true",
+ "target-file-size-bytes" = "536870912"
+);
```
**Notes:**
-1. This operation does not support WHERE conditions
-2. If the specified branch does not exist, the operation will fail
-3. Only branches can be fast-forwarded, not tags
+1. The `rewrite_data_files` operation reads and rewrites data files, which
incurs additional I/O and computing overhead. Please allocate cluster resources
appropriately.
+2. Before execution, you can use SQL from the [View Data File
Distribution](#view-data-file-distribution) section to evaluate whether
rewriting is necessary
+3. WHERE conditions can be used to limit the partitions or data range for
rewriting. This condition filters out files that don't contain data matching
the WHERE condition, reducing the number of files and amount of data to rewrite
+4. Before execution, you can use SQL from the [Rewrite File Selection
Logic](#rewrite-file-selection-logic) section to calculate which files will be
rewritten
+
+### rewrite_manifests
+
+The `rewrite_manifests` operation allows users to optimize Iceberg table
metadata by rewriting manifest files, improving query performance and reducing
metadata overhead.
+
+This addresses manifest file optimization issues in large Iceberg tables, as a
large number of small manifest files can impact query planning performance.
+
+> Supported version: 4.1.0+
+
+**Syntax:**
+
+```sql
+ALTER TABLE [catalog.][database.]table_name
+EXECUTE rewrite_manifests();
+```
+
+**Parameters:**
+
+No parameters
+
+**Return Value:**
+
+Executing the `rewrite_manifests` operation returns a result set with the
following 2 columns:
+
+| Column Name | Type | Description |
+| ---- | ---- | ---- |
+| `rewritten_manifests_count` | BIGINT | Number of files rewritten |
+| `total_data_manifests_count` | BIGINT | Number of data manifest files
rewritten |
+
+**Example:**
+
+```sql
+ALTER TABLE iceberg_db.iceberg_table
+EXECUTE rewrite_manifests();
+```
### rollback_to_snapshot
The `rollback_to_snapshot` operation rolls back an Iceberg table to a
specified snapshot.
+> Supported version: 4.0.2+
+
**Syntax:**
```sql
@@ -2231,23 +2315,23 @@ EXECUTE rollback_to_snapshot ("snapshot_id" =
"<snapshot_id>")
**Parameters:**
-| Parameter Name | Type | Required | Description |
-| -------------- | ---- | -------- | ----------- |
-| `snapshot_id` | Long | Yes | The snapshot ID to rollback to |
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
+| `snapshot_id` | Long | Yes | The snapshot ID to roll back to |
**Return Value:**
-Executing `rollback_to_snapshot` returns a result set with the following 2
columns:
+Executing the `rollback_to_snapshot` operation returns a result set with the
following 2 columns:
| Column Name | Type | Description |
-| ----------- | ---- | ----------- |
-| `previous_snapshot_id` | BIGINT | The snapshot ID that was current before
the rollback operation |
-| `current_snapshot_id` | BIGINT | The snapshot ID that is set as current
after rolling back to the specified snapshot |
+| ---- | ---- | ---- |
+| `previous_snapshot_id` | BIGINT | The ID of the current snapshot before
rollback |
+| `current_snapshot_id` | BIGINT | The snapshot ID set as current after
rollback |
**Example:**
```sql
--- Rollback to snapshot 987654321
+-- Roll back to snapshot 987654321
ALTER TABLE iceberg_db.iceberg_table
EXECUTE rollback_to_snapshot ("snapshot_id" = "987654321");
```
@@ -2255,12 +2339,14 @@ EXECUTE rollback_to_snapshot ("snapshot_id" =
"987654321");
**Notes:**
1. This operation does not support WHERE conditions
-2. If the specified snapshot does not exist, the operation will fail
+2. The operation will fail if the specified snapshot does not exist
3. If the current snapshot is already the target snapshot, the operation
returns directly without creating a new snapshot
### rollback_to_timestamp
-The `rollback_to_timestamp` operation rolls back an Iceberg table to the
snapshot at a specified timestamp.
+The `rollback_to_timestamp` operation rolls back an Iceberg table to the
snapshot at a specified point in time.
+
+> Supported version: 4.0.2+
**Syntax:**
@@ -2271,27 +2357,27 @@ EXECUTE rollback_to_timestamp ("timestamp" =
"<timestamp>")
**Parameters:**
-| Parameter Name | Type | Required | Description |
-| -------------- | ---- | -------- | ----------- |
-| `timestamp` | String | Yes | The timestamp to rollback to. Supports two
formats:<br />1. ISO datetime format: `yyyy-MM-dd HH:mm:ss.SSS` (e.g.,
`2024-01-01 10:30:00.000`)<br />2. Milliseconds timestamp (e.g.,
`1704067200000`) |
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
+| `timestamp` | String | Yes | The point in time to roll back to. Supports two
formats:<br />1. ISO datetime format: `yyyy-MM-dd HH:mm:ss.SSS` (e.g.,
`2024-01-01 10:30:00.000`)<br/>2. Millisecond timestamp (e.g., `1704067200000`)
|
**Return Value:**
-Executing `rollback_to_timestamp` returns a result set with the following 2
columns:
+Executing the `rollback_to_timestamp` operation returns a result set with the
following 2 columns:
| Column Name | Type | Description |
-| ----------- | ---- | ----------- |
-| `previous_snapshot_id` | BIGINT | The snapshot ID that was current before
the rollback operation |
-| `current_snapshot_id` | BIGINT | The snapshot ID that was current at the
specified timestamp, now set as current |
+| ---- | ---- | ---- |
+| `previous_snapshot_id` | BIGINT | The ID of the current snapshot before
rollback |
+| `current_snapshot_id` | BIGINT | The snapshot ID at the specified point in
time, now set as current |
**Examples:**
```sql
--- Rollback to specified timestamp (ISO format)
+-- Roll back to specified point in time (ISO format)
ALTER TABLE iceberg_db.iceberg_table
EXECUTE rollback_to_timestamp ("timestamp" = "2024-01-01 10:30:00.000");
--- Rollback to specified timestamp (timestamp format)
+-- Roll back to specified point in time (timestamp format)
ALTER TABLE iceberg_db.iceberg_table
EXECUTE rollback_to_timestamp ("timestamp" = "1704067200000");
```
@@ -2299,13 +2385,15 @@ EXECUTE rollback_to_timestamp ("timestamp" =
"1704067200000");
**Notes:**
1. This operation does not support WHERE conditions
-2. The `timestamp` parameter supports ISO datetime format (`yyyy-MM-dd
HH:mm:ss.SSS`) or milliseconds timestamp format
-3. If there is no snapshot corresponding to the specified timestamp, the
operation will fail
+2. The `timestamp` parameter supports ISO datetime format (`yyyy-MM-dd
HH:mm:ss.SSS`) or millisecond timestamp format
+3. The operation will fail if there is no corresponding snapshot at the
specified point in time
### set_current_snapshot
The `set_current_snapshot` operation sets the current snapshot of an Iceberg
table to a specified snapshot ID or reference (branch or tag).
+> Supported version: 4.0.2+
+
**Syntax:**
```sql
@@ -2315,19 +2403,19 @@ EXECUTE set_current_snapshot ("snapshot_id" =
"<snapshot_id>" | "ref" = "<ref_na
**Parameters:**
-| Parameter Name | Type | Required | Description |
-| -------------- | ---- | -------- | ----------- |
+| Parameter | Type | Required | Description |
+| -------- | ---- | ---- | ---- |
| `snapshot_id` | Long | Either one | The snapshot ID to set as current |
| `ref` | String | Either one | The reference name (branch or tag) to set as
current |
**Return Value:**
-Executing `set_current_snapshot` returns a result set with the following 2
columns:
+Executing the `set_current_snapshot` operation returns a result set with the
following 2 columns:
| Column Name | Type | Description |
-| ----------- | ---- | ----------- |
-| `previous_snapshot_id` | BIGINT | The snapshot ID that was current before
setting the new current snapshot |
-| `current_snapshot_id` | BIGINT | The snapshot ID that is now set as current
(from `snapshot_id` parameter or resolved from `ref` parameter) |
+| ---- | ---- | ---- |
+| `previous_snapshot_id` | BIGINT | The ID of the current snapshot before the
operation |
+| `current_snapshot_id` | BIGINT | The snapshot ID after the operation (from
`snapshot_id` parameter or resolved from `ref` parameter) |
**Examples:**
@@ -2349,7 +2437,7 @@ EXECUTE set_current_snapshot ("ref" = "v1.0");
1. This operation does not support WHERE conditions
2. The `snapshot_id` and `ref` parameters are mutually exclusive; only one can
be specified
-3. If the specified snapshot ID or reference does not exist, the operation
will fail
+3. The operation will fail if the specified snapshot ID or reference does not
exist
4. If the current snapshot is already the target snapshot, the operation
returns directly without creating a new snapshot
## Iceberg Table Optimization
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]