danielcweeks commented on code in PR #14264:
URL: https://github.com/apache/iceberg/pull/14264#discussion_r2535808483


##########
core/src/main/java/org/apache/iceberg/BaseIncrementalChangelogScan.java:
##########
@@ -179,5 +735,56 @@ public CloseableIterable<ChangelogScanTask> apply(
             }
           });
     }
+
+    /**
+     * Gets delete files that apply to an ADDED data file. Only includes 
deletes added in the same
+     * snapshot as the file.
+     */
+    private DeleteFile[] getDeletesForAddedFile(
+        ManifestEntry<DataFile> entry, long commitSnapshotId) {
+      DeleteFileIndex addedDeleteIndex = 
addedDeletesBySnapshot.get(commitSnapshotId);
+      return addedDeleteIndex == null || addedDeleteIndex.isEmpty()
+          ? NO_DELETES
+          : addedDeleteIndex.forEntry(entry);
+    }
+
+    /**
+     * Gets all delete files that were applied to a DELETED data file up to 
the point it was
+     * deleted. This includes existing deletes and all deletes added in the 
scan range up to (but
+     * not including) the deletion snapshot.
+     */
+    private DeleteFile[] getDeletesForDeletedFile(
+        ManifestEntry<DataFile> entry, long deletionSnapshotId) {
+
+      List<DeleteFile> allDeletes = Lists.newArrayList();
+
+      // Build existing delete index lazily when first DELETED entry is 
encountered
+      DeleteFileIndex existingDeleteIndex = existingDeleteIndexSupplier.get();
+      DeleteFile[] existingDeletes =
+          existingDeleteIndex.isEmpty() ? NO_DELETES : 
existingDeleteIndex.forEntry(entry);
+      for (DeleteFile df : existingDeletes) {
+        allDeletes.add(df);
+      }
+
+      // Add all deletes from snapshots in the scan range BEFORE the deletion
+      List<DeleteFile> cumulativeDeletes = 
cumulativeDeletesMap.get(deletionSnapshotId);
+      if (cumulativeDeletes != null) {
+        // For each cumulative delete, check if it applies to this entry
+        for (DeleteFile df : cumulativeDeletes) {

Review Comment:
   This feels super expensive.  Don't we know that it applies? If there's an 
existing POS/DV against a removed file, it would have to apply.  If there's an 
equality delete, you would just include it regardless and let the task apply 
it?  Overall, it feels like any associated delete would apply or could/would be 
deconflicted by the task.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to