paveon opened a new issue, #1132: URL: https://github.com/apache/iceberg-go/issues/1132
### Feature Request / Improvement `getReferencedFiles` iterates every snapshot and reads all its manifests + entries. Since Iceberg manifests are immutable and shared across snapshots via copy-on-write, the same manifest is read N times where N is the number of snapshots referencing it. For tables with many snapshots this causes the orphan cleaner to spend 93%+ of CPU time in `getReferencedFiles`, making `DeleteOrphanFiles` effectively unusable on large tables. Proposed fix: two-pass approach — first read lightweight manifest lists to discover unique manifest paths, then read each unique manifest's entries once in parallel. Also run the S3 walk and referenced-file collection concurrently via errgroup. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
