laskoviymishka commented on code in PR #1108:
URL: https://github.com/apache/iceberg-go/pull/1108#discussion_r3288418476


##########
table/scanner.go:
##########
@@ -429,6 +429,38 @@ func partitionsMatch(a, b map[int]any) bool {
        return true
 }
 
+// buildDVIndex indexes deletion vectors by the data file path they reference.
+// The spec requires at most one DV per data file; a second entry for the same
+// path is rejected with an error.
+func buildDVIndex(dvEntries []iceberg.ManifestEntry) 
(map[string]iceberg.ManifestEntry, error) {
+       dvIndex := make(map[string]iceberg.ManifestEntry, len(dvEntries))
+       for _, del := range dvEntries {
+               if ref := del.DataFile().ReferencedDataFile(); ref != nil {
+                       if _, exists := dvIndex[*ref]; exists {
+                               return nil, fmt.Errorf("can't index multiple 
deletion vectors for %s", *ref)
+                       }
+                       dvIndex[*ref] = del
+               }
+       }
+
+       return dvIndex, nil
+}
+
+// matchDVToData returns the deletion vector that applies to the given data
+// entry, if any. A DV applies only when the data file's sequence number is
+// less than or equal to the DV's sequence number.
+func matchDVToData(dataEntry iceberg.ManifestEntry, dvIndex 
map[string]iceberg.ManifestEntry) []iceberg.DataFile {
+       dvEntry, ok := dvIndex[dataEntry.DataFile().FilePath()]
+       if !ok {
+               return nil
+       }
+       if dataEntry.SequenceNum() <= dvEntry.SequenceNum() {

Review Comment:
   One subtle thing for a follow-up, not blocking the merge — when 
`dataEntry.SequenceNum() > dvEntry.SequenceNum()` we silently return nil where 
Java throws a `ValidationException` here. And since `SequenceNum()` returns 
`-1` for a nil pointer (see manifest.go around line 2069), a DV entry whose 
seq-num pointer is nil compares wrong against any real data seq-num and gets 
silently skipped. In practice inheritance should populate seq-nums before this 
path, so the live impact is bounded, but it'd be cleaner to either propagate 
the spec violation as an error or guard the `-1` sentinel explicitly so the 
helper's contract isn't load-bearing on upstream behavior. Happy to file a 
follow-up issue with the details.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to