hemanthboyina opened a new pull request, #15519: URL: https://github.com/apache/iceberg/pull/15519
The findDanglingDeletes() method in RemoveDanglingDeletesSparkAction currently loads the ENTRIES metadata table twice — once to compute minimum sequence numbers from data files, and once to find delete file entries. Each load triggers a full scan of all manifest files. This change loads the ENTRIES table once, caches the live entries, and filters into data and delete paths from the cached result. The cache is properly released in a finally block. This halves the manifest I/O for tables with many manifests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
