nsivabalan commented on a change in pull request #4385:
URL: https://github.com/apache/hudi/pull/4385#discussion_r810018886
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
##########
@@ -58,8 +59,30 @@ public CleanPlanActionExecutor(HoodieEngineContext context,
this.extraMetadata = extraMetadata;
}
- protected Option<HoodieCleanerPlan> createCleanerPlan() {
- return execute();
+ private int getCommitsSinceLastCleaning() {
+ Option<HoodieInstant> lastCleanInstant =
table.getActiveTimeline().getCleanerTimeline().filterCompletedInstants().lastInstant();
+ HoodieTimeline commitTimeline =
table.getActiveTimeline().getCommitTimeline().filterCompletedInstants();
+
+ String latestCleanTs;
+ int numCommits = 0;
+ if (lastCleanInstant.isPresent()) {
+ latestCleanTs = lastCleanInstant.get().getTimestamp();
+ numCommits =
commitTimeline.findInstantsAfter(latestCleanTs).countInstants();
+ } else {
+ numCommits = commitTimeline.countInstants();
+ }
+
+ return numCommits;
+ }
+
+ private boolean needCleaning(CleaningTriggerStrategy strategy) {
Review comment:
nit. "needsCleaning"
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
##########
@@ -58,8 +59,34 @@ public CleanPlanActionExecutor(HoodieEngineContext context,
this.extraMetadata = extraMetadata;
}
- protected Option<HoodieCleanerPlan> createCleanerPlan() {
- return execute();
+ private int getCommitInfo() {
+ Option<HoodieInstant> lastCleanInstant =
table.getActiveTimeline().getCleanerTimeline().filterCompletedInstants().lastInstant();
+ HoodieTimeline commitTimeline =
table.getActiveTimeline().getCommitTimeline().filterCompletedInstants();
Review comment:
actually, you may be right. cleaner will attempt to clean entire file
slice only if there are compactions after that. i.e. if there is another
"COMMIT". so, we should be good here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]