shangxinli opened a new pull request, #592:
URL: https://github.com/apache/iceberg-cpp/pull/592

   ## Summary
   
   - Implement the file cleanup logic missing from expire snapshots (#490 noted 
"TODO: File recycling will be added in a followup PR")
   - Port the "reachable file cleanup" strategy from Java's 
`ReachableFileCleanup`
   - Single-threaded implementation; multi-threaded and incremental cleanup as 
TODOs
   
   ## Changes
   
   - Add `Finalize()` override called after successful commit to clean up 
expired files
   - Add `CleanExpiredFiles()` implementing the reachable file cleanup strategy:
     1. Collect manifest paths from expired and retained snapshots
     2. Prune manifests still referenced by retained snapshots
     3. Find data files only in manifests being deleted, subtract files still 
reachable from retained manifests
     4. Delete orphaned manifests, manifest lists, and statistics files
   - Best-effort deletion: suppress errors on individual file deletions to 
avoid blocking metadata updates (matching Java's `suppressFailureWhenFinished`)
   - Branch/tag awareness: retained snapshot set includes all snapshots 
reachable from any ref
   - Respect `CleanupLevel`: `kNone` skips all, `kMetadataOnly` skips data 
files, `kAll` cleans everything
   - Uses `FileIO::DeleteFile` for filesystem compatibility (S3, HDFS, local)
   - 5 new tests for file cleanup behavior
   
   ## Test plan
   
   - [x] All 303 existing tests pass
   - [x] 9 expire snapshots tests pass (4 existing + 5 new)
   - [x] `CleanupLevelNoneSkipsFileDeletion` — verifies kNone skips all deletion
   - [x] `FinalizeSkippedOnCommitError` — verifies no cleanup on commit failure
   - [x] `FinalizeSkippedWhenNoSnapshotsExpired` — verifies no cleanup when 
nothing expired
   - [x] `DeleteWithCustomFunction` — verifies custom delete function is invoked
   - [x] `CommitWithCleanupLevelNone` — end-to-end commit with metadata update
   
   Closes #364


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to