nsivabalan commented on code in PR #17943:
URL: https://github.com/apache/hudi/pull/17943#discussion_r2743560155
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -154,12 +156,44 @@ public List<String>
getPartitionPathsToClean(Option<HoodieInstant> earliestRetai
case KEEP_LATEST_BY_HOURS:
return getPartitionPathsForCleanByCommits(earliestRetainedInstant);
case KEEP_LATEST_FILE_VERSIONS:
+ if (canCleanBeSkipped()) {
+ return Collections.emptyList();
+ }
return getPartitionPathsForFullCleaning();
default:
throw new IllegalStateException("Unknown Cleaner Policy");
}
}
+ /**
+ * Returns true if the clean operation can be skipped entirely
+ * Basically it checks if last_compaction_timestamp < last_clean_timestamp
and modified time of last completed compaction
+ * is less than modified time of last clean's requested instant
+ * In such cases, clean call can be skipped.
+ */
+ private boolean canCleanBeSkipped() {
+ if
(!HoodieTableType.MERGE_ON_READ.equals(hoodieTable.getMetaClient().getTableType()))
{
+ return false;
+ }
+ HoodieTimeline activeTimeline = hoodieTable.getActiveTimeline();
+ Option<HoodieInstant> lastCleanInstant =
activeTimeline.getCleanerTimeline().lastInstant();
+ Option<HoodieInstant> lastCompactionInstant = getCommitTimeline()
+ .filter(instant ->
instant.getAction().equals(HoodieTimeline.COMMIT_ACTION)).lastInstant();
Review Comment:
same comment as above
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -154,12 +156,44 @@ public List<String>
getPartitionPathsToClean(Option<HoodieInstant> earliestRetai
case KEEP_LATEST_BY_HOURS:
return getPartitionPathsForCleanByCommits(earliestRetainedInstant);
case KEEP_LATEST_FILE_VERSIONS:
+ if (canCleanBeSkipped()) {
+ return Collections.emptyList();
+ }
return getPartitionPathsForFullCleaning();
default:
throw new IllegalStateException("Unknown Cleaner Policy");
}
}
+ /**
+ * Returns true if the clean operation can be skipped entirely
+ * Basically it checks if last_compaction_timestamp < last_clean_timestamp
and modified time of last completed compaction
+ * is less than modified time of last clean's requested instant
+ * In such cases, clean call can be skipped.
+ */
+ private boolean canCleanBeSkipped() {
+ if
(!HoodieTableType.MERGE_ON_READ.equals(hoodieTable.getMetaClient().getTableType()))
{
+ return false;
+ }
+ HoodieTimeline activeTimeline = hoodieTable.getActiveTimeline();
+ Option<HoodieInstant> lastCleanInstant =
activeTimeline.getCleanerTimeline().lastInstant();
Review Comment:
should we do completed filtering as well ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]