wgtmac commented on code in PR #648:
URL: https://github.com/apache/iceberg-cpp/pull/648#discussion_r3264532056


##########
src/iceberg/update/expire_snapshots.cc:
##########
@@ -331,6 +332,309 @@ class ReachableFileCleanup : public FileCleanupStrategy {
   }
 };
 
+/// \brief Incremental file cleanup strategy for simple linear-ancestry 
expirations.
+///
+/// Mirrors Java's IncrementalFileCleanup. Only safe when:
+///   * No snapshot IDs were explicitly listed for expiration.
+///   * No removed snapshots lived outside the current main ancestry.
+///   * No retained snapshots live outside the current main ancestry.
+///
+/// Each manifest is attributed to its writer snapshot via added_snapshot_id, 
so
+/// two snapshot passes are enough -- one over retained snapshots to learn 
which
+/// manifests are still live, one over expired snapshots to learn which 
manifests,
+/// manifest lists, and data files to drop. Cherry-pick protection via
+/// SnapshotSummaryFields::kSourceSnapshotId prevents removing data that was
+/// logically introduced by a snapshot whose changes are still present in the
+/// current state under a different id.
+///
+/// TODO(shangxinli): Add multi-threaded manifest reading and file deletion 
support.
+class IncrementalFileCleanup : public FileCleanupStrategy {
+ public:
+  using FileCleanupStrategy::FileCleanupStrategy;
+
+  Status CleanFiles(const TableMetadata& metadata_before_expiration,
+                    const TableMetadata& metadata_after_expiration,
+                    const std::unordered_set<int64_t>& expired_snapshot_ids,

Review Comment:
   Java derives expired IDs from `beforeExpiration` and `afterExpiration` 
inside cleanup. Please do the same here instead of passing a separate set that 
can drift from the two metadata states.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to