laskoviymishka commented on code in PR #1108:
URL: https://github.com/apache/iceberg-go/pull/1108#discussion_r3288418476
##########
table/scanner.go:
##########
@@ -429,6 +429,38 @@ func partitionsMatch(a, b map[int]any) bool {
return true
}
+// buildDVIndex indexes deletion vectors by the data file path they reference.
+// The spec requires at most one DV per data file; a second entry for the same
+// path is rejected with an error.
+func buildDVIndex(dvEntries []iceberg.ManifestEntry)
(map[string]iceberg.ManifestEntry, error) {
+ dvIndex := make(map[string]iceberg.ManifestEntry, len(dvEntries))
+ for _, del := range dvEntries {
+ if ref := del.DataFile().ReferencedDataFile(); ref != nil {
+ if _, exists := dvIndex[*ref]; exists {
+ return nil, fmt.Errorf("can't index multiple
deletion vectors for %s", *ref)
+ }
+ dvIndex[*ref] = del
+ }
+ }
+
+ return dvIndex, nil
+}
+
+// matchDVToData returns the deletion vector that applies to the given data
+// entry, if any. A DV applies only when the data file's sequence number is
+// less than or equal to the DV's sequence number.
+func matchDVToData(dataEntry iceberg.ManifestEntry, dvIndex
map[string]iceberg.ManifestEntry) []iceberg.DataFile {
+ dvEntry, ok := dvIndex[dataEntry.DataFile().FilePath()]
+ if !ok {
+ return nil
+ }
+ if dataEntry.SequenceNum() <= dvEntry.SequenceNum() {
Review Comment:
One subtle thing for a follow-up, not blocking the merge — when
`dataEntry.SequenceNum() > dvEntry.SequenceNum()` we silently return nil where
Java throws a `ValidationException` here. And since `SequenceNum()` returns
`-1` for a nil pointer (see manifest.go around line 2069), a DV entry whose
seq-num pointer is nil compares wrong against any real data seq-num and gets
silently skipped. In practice inheritance should populate seq-nums before this
path, so the live impact is bounded, but it'd be cleaner to either propagate
the spec violation as an error or guard the `-1` sentinel explicitly so the
helper's contract isn't load-bearing on upstream behavior. Happy to file a
follow-up issue with the details.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]