wombatu-kun opened a new pull request, #16390: URL: https://github.com/apache/iceberg/pull/16390
Closes #15659 ## What `SnapshotChanges` previously exposed only cached accessors that eagerly materialize all file changes into in-memory lists and return `Iterable`. Some callers (for example replaced-partition validation, see #13556) need to stream changes without loading every file into memory. This PR adds streaming `CloseableIterable` accessors and re-implements the cached accessors as thin wrappers over them, exactly as suggested in #15659. ## Changes - Add `addedDataFilesIterable()`, `removedDataFilesIterable()`, `addedDeleteFilesIterable()` and `removedDeleteFilesIterable()`, which return lazily-evaluated `CloseableIterable`s that the caller must close and that are not cached. - Re-implement the cached accessors (`addedDataFiles()`, `removedDataFiles()`, `addedDeleteFiles()`, `removedDeleteFiles()`) as `materialize()` wrappers over the streaming methods, preserving their existing caching identity contract. - Replace the per-type cache/read methods with a single generic manifest-reading pipeline. Manifests are still read single-threaded by default and in parallel with a bounded queue when an executor is configured via `Builder.executeWith`. ## Trade-off Reading both added and removed changes through the cached accessors now performs two manifest passes instead of one. This keeps the change minimal and strictly additive; a single-pass optimization can be added in a follow-up if it proves necessary. ## Testing Adds 10 focused tests in `TestSnapshotChanges` (the 3 existing tests are unchanged), each guarding a distinct code path: streaming added/removed data and delete files, equivalence with the cached results, non-caching semantics, statistics retention versus stripping (`copy()` vs `copyWithoutStats()`), EXISTING-entry exclusion, snapshot-id manifest filtering, and the parallel execution path. `:iceberg-core:test`, `spotlessCheck` and `revapi` all pass; the change is purely additive so no revapi exception is required. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
