Re: [PR] Spec: V4 Adaptive Metadata Tree Spec Changes for Entry Structures [iceberg]

via GitHub Tue, 12 May 2026 07:53:37 -0700


RussellSpitzer commented on code in PR #16025:
URL: https://github.com/apache/iceberg/pull/16025#discussion_r3227430618



##########
format/spec.md:
##########
@@ -590,85 +592,182 @@ A data or delete file is associated with a sort order by 
the sort order's id wit
 
 ### Manifests
 
-A manifest is an immutable Avro file that lists data files or delete files, 
along with each file’s partition data tuple, metrics, and tracking information. 
One or more manifest files are used to store a [snapshot](#snapshots), which 
tracks all of the files in a table at some point in time. Manifests are tracked 
by a [manifest list](#manifest-lists) for each table snapshot.
+A manifest is an immutable file that lists data files or delete files, along 
with each file’s partition data, metrics, and tracking information. One or more 
manifest files are used to store a [snapshot](#snapshots), which tracks all of 
the files in a table at some point in time. In V1-V3, manifests are tracked by 
a [manifest list](#manifest-lists) for each table snapshot. In V4, a single 
root manifest per snapshot can directly reference data files, delete files, and 
other data and delete manifests.
 
-A manifest is a valid Iceberg data file: files must use valid Iceberg formats, 
schemas, and column projection.
+Manifests are valid Iceberg data files: files must use valid Iceberg formats, 
schemas, and column projection.
 
 A manifest may store either data files or delete files, but not both because 
manifests that contain delete files are scanned first during job planning. 
Whether a manifest is a data manifest or a delete manifest is stored in 
manifest metadata.
 
-A manifest stores files for a single partition spec. When a table’s partition 
spec changes, old files remain in the older manifest and newer files are 
written to a new manifest. This is required because a manifest file’s schema is 
based on its partition spec (see below). The partition spec of each manifest is 
also used to transform predicates on the table's data rows into predicates on 
partition values that are used during job planning to select files from a 
manifest.
+**Partition Spec Binding:**
 
-A manifest file must store the partition spec and other metadata as properties 
in the Avro file's key-value metadata:
-
-| v1         | v2         | Key                 | Value                        
                                                                                
                               |
-|------------|------------|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------|
-| _required_ | _required_ | `schema`            | JSON representation of the 
table schema at the time the manifest was written                               
                                 |
-| _optional_ | _required_ | `schema-id`         | ID of the schema used to 
write the manifest as a string                                                  
                                   |
-| _required_ | _required_ | `partition-spec`    | JSON representation of only 
the partition fields array of the partition spec used to write the manifest. 
See [Appendix C](#partition-specs) |
-| _optional_ | _required_ | `partition-spec-id` | ID of the partition spec 
used to write the manifest as a string                                          
                                   |
-| _optional_ | _required_ | `format-version`    | Table format version number 
of the manifest as a string                                                     
                                |
-|            | _required_ | `content`           | Type of content files 
tracked by the manifest: "data" or "deletes"                                    
                                      |
-
-The schema of a manifest file is defined by the `manifest_entry` struct, 
described in the following section.
-
-#### Manifest Entry Fields
-
-The `manifest_entry` struct consists of the following fields:
-
-| v1         | v2         | Field id, name                | Type               
                                       | Description |
-| ---------- | ---------- 
|-------------------------------|-----------------------------------------------------------|-------------|
-| _required_ | _required_ | **`0  status`**               | `int` with 
meaning: `0: EXISTING` `1: ADDED` `2: DELETED` | Used to track additions and 
deletions. Deletes are informational only and not used in scans. |
-| _required_ | _optional_ | **`1  snapshot_id`**          | `long`             
                                       | Snapshot id where the file was added, 
or deleted if status is 2. Inherited when null. |
-|            | _optional_ | **`3  sequence_number`**      | `long`             
                                       | Data sequence number of the file. 
Inherited when null and status is 1 (added). |
-|            | _optional_ | **`4  file_sequence_number`** | `long`             
                                       | File sequence number indicating when 
the file was added. Inherited when null and status is 1 (added). |
-| _required_ | _required_ | **`2  data_file`**            | `data_file` 
`struct` (see below)                          | File path, partition tuple, 
metrics, ... |
-
-The manifest entry fields are used to keep track of the snapshot in which 
files were added or logically deleted. The `data_file` struct, defined below, 
is nested inside the manifest entry so that it can be easily passed to job 
planning without the manifest entry fields.
-
-When a file is added to the dataset, its manifest entry should store the 
snapshot ID in which the file was added and set status to 1 (added).
-
-When a file is replaced or deleted from the dataset, its manifest entry fields 
store the snapshot ID in which the file was deleted and status 2 (deleted). The 
file may be deleted from the file system when the snapshot in which it was 
deleted is garbage collected, assuming that older snapshots have also been 
garbage collected [1].
-
-Iceberg v2 adds data and file sequence numbers to the entry and makes the 
snapshot ID optional. Values for these fields are inherited from manifest 
metadata when `null`. That is, if the field is `null` for an entry, then the 
entry must inherit its value from the manifest file's metadata, stored in the 
manifest list.
-The `sequence_number` field represents the data sequence number and must never 
change after a file is added to the dataset. The data sequence number 
represents a relative age of the file content and should be used for planning 
which delete files apply to a data file.
-The `file_sequence_number` field represents the sequence number of the 
snapshot that added the file and must also remain unchanged upon assigning at 
commit. The file sequence number can't be used for pruning delete files as the 
data within the file may have an older data sequence number.
-The data and file sequence numbers are inherited only if the entry status is 1 
(added). If the entry status is 0 (existing) or 2 (deleted), the entry must 
include both sequence numbers explicitly.
-
-Notes:
-
-1. Technically, data files can be deleted when the last snapshot that contains 
the file as “live” data is garbage collected. But this is harder to detect and 
requires finding the diff of multiple snapshots. It is easier to track what 
files are deleted in a snapshot and delete them when that snapshot expires.  It 
is not recommended to add a deleted file back to a table. Adding a deleted file 
can lead to edge cases where incremental deletes can break table snapshots.
-2. Manifest list files are required in v2, so that the `sequence_number` and 
`snapshot_id` to inherit are always available.
+- V1-V3: A manifest stores files for a single partition spec. When a table’s 
partition spec changes, old files remain in the older manifest and newer files 
are written to a new manifest. This is required because a manifest file’s 
schema is based on its partition spec. The partition spec of each manifest is 
used to transform predicates on the table’s data rows into predicates on 
partition values during job planning.
+- V4: Manifests are not bound to a single partition spec. Files with different 
partition specs can coexist in the same manifest because partition values are 
stored in column statistics using source column IDs rather than in a 
partition-spec-specific struct. The `partition-spec-id` in manifest metadata is 
tracked for informational purposes but does not constrain the contents.

Review Comment:
   I'd leave the rational out of this paragraph. I think it's find to just say 
that they are not bound to a partition spec, I think partition-spec-id needs a 
better description here ... The spec id used by the writer when generating this 
data file?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spec: V4 Adaptive Metadata Tree Spec Changes for Entry Structures [iceberg]

Reply via email to