laskoviymishka opened a new issue, #1050:
URL: https://github.com/apache/iceberg-go/issues/1050
Parent: #589.
`PlanFiles` builds `dvIndex` by iterating DV manifest entries and grouping
by `ReferencedDataFile()`. Two gaps relative to Java's `DeleteFileIndex`:
The first is a missing sequence-number guard. The spec says a DV applies to
a data file only when the data file's `data_sequence_number` is less than or
equal to the DV's `data_sequence_number`. Java's `DeleteFileIndex.findDV`
enforces this with a `ValidationException`. The Go `dvIndex` build skips the
check, so a stale DV from a prior epoch paired with a newer data file is
silently applied — a silent over-deletion path that only triggers on malformed
manifests, but the guard is cheap.
The second is that multiple DVs per data file are silently unioned. Java's
`Builder.add` errors with `ValidationException("Can't index multiple DVs for
%s")`. The Go `dvIndex` is `map[string][]iceberg.DataFile`, so any number of
entries are accepted per path. The scanner's `readAllDeletionVectors`
defensively rejects this at read time as of #996, but the canonical home for
the check is planning — that way callers inspecting
`FileScanTask.DeletionVectorFiles` directly see only validated state.
One PR naturally: both checks live in the same `dvIndex` construction loop.
The loop was last edited by #996, so this work text-conflicts with that PR's
pos-delete suppression block — rebase if both are in flight together.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]