dttung2905 opened a new pull request, #1147:
URL: https://github.com/apache/iceberg-go/pull/1147

   ## Summary
   Fix `ManifestWriter` mishandling of `min_sequence_number` when live manifest 
entries carry sequence number `0`. This is the valid inherited value for 
v1-upgraded tables per the Iceberg spec, but the writer previously skipped seq 
`0` when computing the minimum and then treated a computed min of `0` as 
"unset", causing the manifest list writer to substitute the current commit's 
sequence number instead.
   ## Problem
   When a v2+ table carries forward files from a v1 upgrade (or rewrites 
manifests containing EXISTING entries with inherited `sequence_number = 0`), 
two bugs in `ManifestWriter` produced an incorrect `min_sequence_number`:
   1. **`addEntry` skipped seq `0`** — min-seq tracking only ran when 
`entry.SequenceNum() > 0`, so v1-inherited entries at sequence `0` were never 
counted.
   2. **`ToManifestFile` erased a valid min of `0`** — if the computed minimum 
was `0`, it was converted to `-1` ("unset").
   When `MinSeqNumber == -1`, `ManifestListWriter.AddManifests` substitutes the 
commit's sequence number. That inflates `min_sequence_number` on data manifests 
that should remain `0`.
   
   ## Spec reference
   
https://github.com/apache/iceberg/blob/7522b02aaf42d51d4b12e211119d159c9dc03552/format/spec.md?plain=1#L1008-L1009
   
   
   > **`515 sequence_number`** — The sequence number when the manifest was 
added to the table; **use 0 when reading v1 manifest lists**
   >
   > **`516 min_sequence_number`** — The minimum data sequence number of all 
live data or delete files in the manifest; **use 0 when reading v1 manifest 
lists**
   
   
   
https://github.com/apache/iceberg/blob/7522b02aaf42d51d4b12e211119d159c9dc03552/format/spec.md?plain=1#L1978-L1985
   
   > * Manifest list field `sequence-number` must default to **0**
   > * Manifest list field `min-sequence-number` must default to **0**
   > * Manifest entry field `sequence_number` must default to **0**
   > * Manifest entry field `file_sequence_number` must default to **0**
   Sequence number `0` is therefore a **valid, meaningful minimum** — not 
equivalent to "unset". Only `-1` (internal sentinel) should trigger assignment 
at commit time.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to