zhuanshenbsj1 commented on code in PR #7405:
URL: https://github.com/apache/hudi/pull/7405#discussion_r1058708650
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -188,8 +188,12 @@ private List<String>
getPartitionPathsForIncrementalCleaning(HoodieCleanMetadata
+ "since last cleaned at " + cleanMetadata.getEarliestCommitToRetain()
+ ". New Instant to retain : " + newInstantToRetain);
return
hoodieTable.getCompletedCommitsTimeline().getInstantsAsStream().filter(
- instant -> HoodieTimeline.compareTimestamps(instant.getTimestamp(),
HoodieTimeline.GREATER_THAN_OR_EQUALS,
- cleanMetadata.getEarliestCommitToRetain()) &&
HoodieTimeline.compareTimestamps(instant.getTimestamp(),
+ instant -> (HoodieTimeline.compareTimestamps(instant.getTimestamp(),
HoodieTimeline.GREATER_THAN_OR_EQUALS,
+ cleanMetadata.getEarliestCommitToRetain())
+ || (instant.getMarkerFileModificationTimestamp().isPresent()
Review Comment:
> If an out-of-order replace commit finished before the clean start and the
instant time of the replace commit is before the earliest commit to retain, it
won't be cleaned and left in the timeline. Archiver will then archive it since
it's last modified time is earlier than the last clean in the timeline. What do
you think?
You are right,it still won't clean the clustering instant in this scenario.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]