paveon opened a new issue, #1117:
URL: https://github.com/apache/iceberg-go/issues/1117

   ### Feature Request / Improvement
   
   ### Apache Iceberg version
     
     main (and v0.5.1)
     
     ### Description
     
     `table.removeSnapshotsUpdate.PostCommit` (`table/updates.go`) walks every 
expired
     snapshot's manifest list and opens each referenced manifest file 
individually:
     
     ```go
     for _, snapId := range u.SnapshotIDs {
         snap := preTable.SnapshotByID(snapId)
         mans, err := snap.Manifests(prefs)
         for _, man := range mans {
             for entry, err := range man.Entries(prefs, false) { ... }
         }   
     }   
     ```
   
     Iceberg manifests are shared by reference across snapshots — an APPEND 
commit
     produces a new manifest list pointing at all the existing manifests plus 
1–2
     new ones. So a shared manifest gets opened once per expired snapshot that
     references it, instead of once total.
   
     For a table with 491 incremental-append snapshots, expiring 490 of them
     causes ~sum(1..490) = ~120k manifest-file downloads from object storage 
where
     ~500 unique reads would suffice. We observed a single-table expire running
     for hours in this state.
   
     The retained-snapshot pass below has the same shape and could also dedupe
     across retained snapshots.
   
     Willing to contribute
   
     - I can contribute a fix for this bug independently


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to