pvary commented on code in PR #14264:
URL: https://github.com/apache/iceberg/pull/14264#discussion_r2417478771


##########
core/src/main/java/org/apache/iceberg/BaseIncrementalChangelogScan.java:
##########
@@ -71,6 +80,12 @@ protected CloseableIterable<ChangelogScanTask> doPlanFiles(
             .filter(manifest -> 
changelogSnapshotIds.contains(manifest.snapshotId()))
             .toSet();
 
+    // Build delete file index for existing deletes (before the start snapshot)
+    DeleteFileIndex existingDeleteIndex = 
buildExistingDeleteIndex(fromSnapshotIdExclusive);

Review Comment:
   >The equality delete might created before the data file but applies to rows 
that exist in the new data file. 
   
   This is not true. The equality delete does not apply for the new data files. 
The equality delete only applies for data files committed before the snapshot 
which adds the equality delete file.
   
   **OTOH**
   We still need to handle previous deletes. Consider the situation:
   - Snapshot 1 (S1) adds Row 1 (R1)
   - Snapshot 2 (S2) deletes R1
   - Snapshot 3 (S3) adds an equality delete which would delete R1 - but does 
not, since it is already deleted
   
   In this case the rows we should emit in a change log scan:
   - S1 emits an insert for R1
   - S2 emits a delete for R1
   - S3 should not emit any row for R1
   
   We can only calculate S3 if we know which rows were deleted before S3.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to