paveon opened a new pull request, #1118:
URL: https://github.com/apache/iceberg-go/pull/1118

   Fixes https://github.com/apache/iceberg-go/issues/1117
   
   ### What changed
   Reworked `removeSnapshotsUpdate.PostCommit` so each unique manifest file is
   opened at most once per call, regardless of how many expired or retained
   snapshots reference it.
   
   Two passes, both deduped:
   
   1) Build the set of manifest paths reachable from any retained snapshot,
   reading only manifest-lists. Cache the resulting `[]ManifestFile` per
   snapshot so the retained-side pass below doesn't re-download each list.
   2) Walk expired snapshots' manifest lists; for each manifest, skip if it's
   in the retained set (its data files are live by definition and the
   manifest itself must not be deleted) or if a prior expired snapshot
   already enumerated it. Otherwise read its entries once.
   3) Subtract live data files via a single walk over each unique retained
   manifest. DELETED entries remain tombstones (unchanged from prior
   semantics).
   
   ### Behavior
   Semantically equivalent to the previous implementation — the final
   `filesToDelete` set is the same on well-formed metadata. No spec change,
   no API change. The only difference is the I/O cost.
   
   ### Performance impact
   For a 491-snapshot incremental-append table where expiring 490 snapshots
   previously triggered ~sum(1..490) ≈ 120,000 manifest-file downloads, the
   rewrite reduces that to roughly the count of unique orphaned manifests
   (a few hundred in practice). Two-to-three orders of magnitude fewer
   object-store reads, in our test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to