RussellSpitzer opened a new pull request, #15656:
URL: https://github.com/apache/iceberg/pull/15656

   Working on #15634 led me to a bunch more cases where we are reading 
manifests without passing through specsById which weren't caught in #15241 or 
#15575 so i'm fixing them here. The biggest change is we need to be able to 
find aggregates of changes.
   
   Generalize SnapshotChanges to support multi-snapshot aggregation via 
snapshots(List<Snapshot>) on the builder, replacing the need for 
SnapshotUtil.newFilesBetween in CherryPickOperation. Deprecate the remaining 
newFilesBetween overloads in SnapshotUtil. Update CatalogUtil, 
ReachableFileCleanup, and PartitionStatsHandler to pass specsById when opening 
manifests.
   
   Production Changes
   
   1. `SnapshotChanges` gets a `snapshots` method so that we can get changes 
from multiple snapshots at the same time.
   2. `CherryPickOperation` was using `SnapshotUtil.newFilesBetween` which was 
reading manifests directly. Switched to `SnapshotChanges` with the new 
`snapshots` method.
   3. Deprecate `SnapshotUtil.newFilesBetween`
   4. Replaced usages of `ManifestFiles.open(manifest, io)` with 
`ManifestFiles.open(manifest, io, specsById)`
      - `CatalogUtil`
      - `PartitionStatsHandler`
      - `ReachableFileCleanup`
      - `MicroBatchUtils` (Spark v4.1 only)
      
      
    And many other Test changes
    
     


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to