yihua commented on PR #9039:
URL: https://github.com/apache/hudi/pull/9039#issuecomment-1604718015

   > 2. getOldestInstantToRetainForCompaction also needs to check 
earliestInstantToRetain like getOldestInstantToRetainForClustering, to ensure 
that the files related to the compaction instant have been cleaned up before 
archiving.
   
   This is not a bug, and changes around this is unnecessary.  The reason is 
that compaction is different from clustering in the sense that compaction does 
not add or delete any file group, while clustering generates a replacecommit 
that replaces existing file groups with new ones, so cleaning has to delete old 
file groups based on the information from the replacecommit in the active 
timeline.  Even if the compaction commit is archived, the cleaning still 
behaves properly, as the old file slices that the compaction operation touches 
can still be identified and deleted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to