This is an automated email from the ASF dual-hosted git repository.
fokko pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/main by this push:
new 12b1f52ea4 Spec: Allow the use of `source-id` in V3 (#12644)
12b1f52ea4 is described below
commit 12b1f52ea46b423f80c5bd83f7cf1d6b1db519f4
Author: Fokko Driesprong <[email protected]>
AuthorDate: Tue Apr 22 10:29:44 2025 +0200
Spec: Allow the use of `source-id` in V3 (#12644)
---
format/spec.md | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/format/spec.md b/format/spec.md
index cfca26239f..7dec296200 100644
--- a/format/spec.md
+++ b/format/spec.md
@@ -1430,12 +1430,16 @@ Each partition field in `fields` is stored as a JSON
object with the following p
| V1 | V2 | V3 | Field | JSON representation |
Example |
|----------|----------|----------|------------------|---------------------|--------------|
-| required | required | omitted | **`source-id`** | `JSON int` | 1
|
-| | | required | **`source-ids`** | `JSON list of ints` |
`[1,2]` |
+| required | required | optional | **`source-id`** | `JSON int` | 1
|
+| | | optional | **`source-ids`** | `JSON list of ints` |
`[1,2]` |
| | required | required | **`field-id`** | `JSON int` |
1000 |
| required | required | required | **`name`** | `JSON string` |
`id_bucket` |
| required | required | required | **`transform`** | `JSON string` |
`bucket[16]` |
+Notes:
+
+1. For partition fields with a transform with a single argument, only
`source-id` is written. In case of a multi-argument transform, only
`source-ids` is written.
+
Supported partition transforms are listed below.
|Transform or Field|JSON representation|Example|
@@ -1470,12 +1474,14 @@ Each sort field in the fields list is stored as an
object with the following pro
| V1 | V2 | V3 | Field | JSON representation |
Example |
|----------|----------|----------|------------------|---------------------|-------------|
| required | required | required | **`transform`** | `JSON string` |
`bucket[4]` |
-| required | required | omitted | **`source-id`** | `JSON int` | 1
|
-| | | required | **`source-ids`** | `JSON list of ints` |
`[1,2]` |
+| required | required | optional | **`source-id`** | `JSON int` | 1
|
+| | | optional | **`source-ids`** | `JSON list of ints` |
`[1,2]` |
| required | required | required | **`direction`** | `JSON string` |
`asc` |
| required | required | required | **`null-order`** | `JSON string` |
`nulls-last`|
-In v3 metadata, writers must use only `source-ids` because v3 requires reader
support for multi-arg transforms.
+Notes:
+
+1. For sort fields with a transform with a single argument, only `source-id`
is written. In case of a multi-argument transform, only `source-ids` is written.
Older versions of the reference implementation can read tables with transforms
unknown to it, ignoring them. But other implementations may break if they
encounter unknown transforms. All v3 readers are required to read tables with
unknown transforms, ignoring them.
@@ -1622,13 +1628,8 @@ All readers are required to read tables with unknown
partition transforms, ignor
Writing v3 metadata:
* Partition Field and Sort Field JSON:
- * `source-ids` was added and is required
- * `source-id` is no longer required and should be omitted; always use
`source-ids` instead
-
-Reading v1 or v2 metadata for v3:
-
-* Partition Field and Sort Field JSON:
- * `source-ids` should default to a single-value list of the value of
`source-id`
+ * `source-ids` was added and must be written in the case of a
multi-argument transform.
+ * `source-id` must be written in the case of single-argument transforms.
Row-level delete changes: