wombatu-kun opened a new pull request, #16390:
URL: https://github.com/apache/iceberg/pull/16390

   Closes #15659
   
   ## What
   
   `SnapshotChanges` previously exposed only cached accessors that eagerly 
materialize all file changes into in-memory lists and return `Iterable`. Some 
callers (for example replaced-partition validation, see #13556) need to stream 
changes without loading every file into memory. This PR adds streaming 
`CloseableIterable` accessors and re-implements the cached accessors as thin 
wrappers over them, exactly as suggested in #15659.
   
   ## Changes
   
   - Add `addedDataFilesIterable()`, `removedDataFilesIterable()`, 
`addedDeleteFilesIterable()` and `removedDeleteFilesIterable()`, which return 
lazily-evaluated `CloseableIterable`s that the caller must close and that are 
not cached.
   - Re-implement the cached accessors (`addedDataFiles()`, 
`removedDataFiles()`, `addedDeleteFiles()`, `removedDeleteFiles()`) as 
`materialize()` wrappers over the streaming methods, preserving their existing 
caching identity contract.
   - Replace the per-type cache/read methods with a single generic 
manifest-reading pipeline. Manifests are still read single-threaded by default 
and in parallel with a bounded queue when an executor is configured via 
`Builder.executeWith`.
   
   ## Trade-off
   
   Reading both added and removed changes through the cached accessors now 
performs two manifest passes instead of one. This keeps the change minimal and 
strictly additive; a single-pass optimization can be added in a follow-up if it 
proves necessary.
   
   ## Testing
   
   Adds 10 focused tests in `TestSnapshotChanges` (the 3 existing tests are 
unchanged), each guarding a distinct code path: streaming added/removed data 
and delete files, equivalence with the cached results, non-caching semantics, 
statistics retention versus stripping (`copy()` vs `copyWithoutStats()`), 
EXISTING-entry exclusion, snapshot-id manifest filtering, and the parallel 
execution path. `:iceberg-core:test`, `spotlessCheck` and `revapi` all pass; 
the change is purely additive so no revapi exception is required.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to