wgtmac commented on code in PR #648:
URL: https://github.com/apache/iceberg-cpp/pull/648#discussion_r3271056940
##########
src/iceberg/update/expire_snapshots.cc:
##########
@@ -608,16 +917,24 @@ Status ExpireSnapshots::Finalize(Result<const
TableMetadata*> commit_result) {
auto metadata_before_expiration_ptr =
apply_result_->metadata_before_expiration;
const TableMetadata& metadata_before_expiration =
*metadata_before_expiration_ptr;
const TableMetadata& metadata_after_expiration = *commit_result.value();
- std::unordered_set<int64_t>
expired_ids(apply_result_->snapshot_ids_to_remove.begin(),
-
apply_result_->snapshot_ids_to_remove.end());
apply_result_.reset();
- // File cleanup is best-effort: log and continue on individual file deletion
failures
- ReachableFileCleanup strategy(ctx_->table->io(), delete_func_);
- return strategy.CleanFiles(metadata_before_expiration,
metadata_after_expiration,
- expired_ids, cleanup_level_);
+ // Pick incremental cleanup when the expiration is a simple linear-ancestry
walk:
+ // no explicit snapshot IDs, no removed snapshots outside main ancestry, and
no
+ // retained snapshots outside main ancestry.
+ const bool can_use_incremental =
+ !specified_snapshot_id_ &&
+ !HasRemovedNonMainAncestors(metadata_before_expiration,
+ metadata_after_expiration) &&
+ !HasNonMainSnapshots(metadata_after_expiration);
+
+ if (can_use_incremental) {
+ return IncrementalFileCleanup(ctx_->table->io(), delete_func_)
+ .CleanFiles(metadata_before_expiration, metadata_after_expiration,
+ cleanup_level_);
+ }
+ return ReachableFileCleanup(ctx_->table->io(), delete_func_)
Review Comment:
Reachable still reads manifests-to-delete with the after metadata. Please
use the pre-expiration metadata for that read so `CleanExpiredMetadata(true)`
cannot prune the spec before cleanup.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]