danny0405 commented on code in PR #17943:
URL: https://github.com/apache/hudi/pull/17943#discussion_r2767353800
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -154,12 +156,44 @@ public List<String>
getPartitionPathsToClean(Option<HoodieInstant> earliestRetai
case KEEP_LATEST_BY_HOURS:
return getPartitionPathsForCleanByCommits(earliestRetainedInstant);
case KEEP_LATEST_FILE_VERSIONS:
+ if (canCleanBeSkipped()) {
+ return Collections.emptyList();
+ }
return getPartitionPathsForFullCleaning();
default:
throw new IllegalStateException("Unknown Cleaner Policy");
}
}
+ /**
+ * Returns true if the clean operation can be skipped entirely
+ * Basically it checks if last_compaction_timestamp < last_clean_timestamp
and modified time of last completed compaction
+ * is less than modified time of last clean's requested instant
+ * In such cases, clean call can be skipped.
+ */
+ private boolean canCleanBeSkipped() {
+ if
(!HoodieTableType.MERGE_ON_READ.equals(hoodieTable.getMetaClient().getTableType()))
{
+ return false;
+ }
+ HoodieTimeline activeTimeline = hoodieTable.getActiveTimeline();
+ Option<HoodieInstant> lastCleanInstant =
activeTimeline.getCleanerTimeline().lastInstant();
+ Option<HoodieInstant> lastCompactionInstant = getCommitTimeline()
+ .filter(instant ->
instant.getAction().equals(HoodieTimeline.COMMIT_ACTION)).lastInstant();
+ if (!lastCompactionInstant.isPresent() || !lastCleanInstant.isPresent()) {
+ return false;
+ }
+
+ // Check whether there are any other commits apart from deltacommits
between last compaction and last clean.
+ int nonDeltaCommitsBetweenCompactionAndClean = activeTimeline
Review Comment:
if this is false
```java
InstantComparison.compareTimestamps(lastCompactionInstant.get().getCompletionTime(),
InstantComparison.LESSER_THAN,
lastCleanInstant.get().requestedTime())
```
there is no need to even calculate the
`nonDeltaCommitsBetweenCompactionAndClean` I guess.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]