dttung2905 opened a new pull request, #1147:
URL: https://github.com/apache/iceberg-go/pull/1147
## Summary
Fix `ManifestWriter` mishandling of `min_sequence_number` when live manifest
entries carry sequence number `0`. This is the valid inherited value for
v1-upgraded tables per the Iceberg spec, but the writer previously skipped seq
`0` when computing the minimum and then treated a computed min of `0` as
"unset", causing the manifest list writer to substitute the current commit's
sequence number instead.
## Problem
When a v2+ table carries forward files from a v1 upgrade (or rewrites
manifests containing EXISTING entries with inherited `sequence_number = 0`),
two bugs in `ManifestWriter` produced an incorrect `min_sequence_number`:
1. **`addEntry` skipped seq `0`** — min-seq tracking only ran when
`entry.SequenceNum() > 0`, so v1-inherited entries at sequence `0` were never
counted.
2. **`ToManifestFile` erased a valid min of `0`** — if the computed minimum
was `0`, it was converted to `-1` ("unset").
When `MinSeqNumber == -1`, `ManifestListWriter.AddManifests` substitutes the
commit's sequence number. That inflates `min_sequence_number` on data manifests
that should remain `0`.
## Spec reference
https://github.com/apache/iceberg/blob/7522b02aaf42d51d4b12e211119d159c9dc03552/format/spec.md?plain=1#L1008-L1009
> **`515 sequence_number`** — The sequence number when the manifest was
added to the table; **use 0 when reading v1 manifest lists**
>
> **`516 min_sequence_number`** — The minimum data sequence number of all
live data or delete files in the manifest; **use 0 when reading v1 manifest
lists**
https://github.com/apache/iceberg/blob/7522b02aaf42d51d4b12e211119d159c9dc03552/format/spec.md?plain=1#L1978-L1985
> * Manifest list field `sequence-number` must default to **0**
> * Manifest list field `min-sequence-number` must default to **0**
> * Manifest entry field `sequence_number` must default to **0**
> * Manifest entry field `file_sequence_number` must default to **0**
Sequence number `0` is therefore a **valid, meaningful minimum** — not
equivalent to "unset". Only `-1` (internal sentinel) should trigger assignment
at commit time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]