talatuyarer commented on code in PR #14264:
URL: https://github.com/apache/iceberg/pull/14264#discussion_r2483125421
##########
core/src/main/java/org/apache/iceberg/BaseIncrementalChangelogScan.java:
##########
@@ -133,13 +158,473 @@ private static Map<Long, Integer>
computeSnapshotOrdinals(Deque<Snapshot> snapsh
return snapshotOrdinals;
}
+ /**
+ * Builds a delete file index for existing deletes that were present before
the start snapshot.
+ * These deletes should be applied to data files but should not generate
DELETE changelog rows.
+ * Uses manifest pruning and caching to optimize performance.
+ */
+ private DeleteFileIndex buildExistingDeleteIndex(
+ Long fromSnapshotIdExclusive, Map<Long, DeleteFileIndex>
addedDeletesBySnapshot) {
+ if (fromSnapshotIdExclusive == null) {
+ return DeleteFileIndex.builderFor(ImmutableList.of()).build();
+ }
+
+ // Check if we need existingDeleteIndex for equality deletes
+ boolean needsExistingDeleteIndex = false;
Review Comment:
I implemented lazyIndexbuild as you suggested and also added 5 tests to make
sure it works as expected. If there is any equaity delete on my scan range. I
call buildExistingDeleteIndex, if there is no equality delete on my delete
files. I lazyly buildExistingDeleteIndex whenever I see DeleteFile. If I have
built the index for equality delete, I reused for DeleteFile.
No second pass no redundant index build.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]