vinothchandar commented on a change in pull request #4385:
URL: https://github.com/apache/hudi/pull/4385#discussion_r773565504
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
##########
@@ -58,8 +59,34 @@ public CleanPlanActionExecutor(HoodieEngineContext context,
this.extraMetadata = extraMetadata;
}
- protected Option<HoodieCleanerPlan> createCleanerPlan() {
- return execute();
+ private int getCommitInfo() {
Review comment:
rename: getCommitsSinceLastCleaning? and then name the variable
`numCommits`?
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
##########
@@ -128,6 +155,9 @@ HoodieCleanerPlan requestClean(HoodieEngineContext context)
{
@Override
public Option<HoodieCleanerPlan> execute() {
+ if (!needCleaning(config.getInlineCleaningTriggerStrategy())) {
Review comment:
does it matter really if this is inline or async cleaning? since it
affects only the planning
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
##########
@@ -58,8 +59,34 @@ public CleanPlanActionExecutor(HoodieEngineContext context,
this.extraMetadata = extraMetadata;
}
- protected Option<HoodieCleanerPlan> createCleanerPlan() {
- return execute();
+ private int getCommitInfo() {
+ Option<HoodieInstant> lastCleanInstant =
table.getActiveTimeline().getCleanerTimeline().filterCompletedInstants().lastInstant();
+ HoodieTimeline commitTimeline =
table.getActiveTimeline().getCommitTimeline().filterCompletedInstants();
+
+ String latestCleanTs;
+ int commitsSinceLastCleaning = 0;
+ if (lastCleanInstant.isPresent()) {
+ latestCleanTs = lastCleanInstant.get().getTimestamp();
+ commitsSinceLastCleaning =
commitTimeline.findInstantsAfter(latestCleanTs).countInstants();
+ } else {
+ String firstCommitTs =
commitTimeline.firstInstant().get().getTimestamp();
+ commitsSinceLastCleaning =
commitTimeline.findInstantsAfterOrEquals(firstCommitTs,
Integer.MAX_VALUE).countInstants();
Review comment:
is n't this same as `commitTimeline.countInstants()`?
##########
File path:
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestTable.java
##########
@@ -143,6 +143,10 @@ public static String makeNewCommitTime(int sequence) {
return String.format("%09d", sequence);
}
+ public static String makeNewCommitTimeInHudiFormat(int sequence) {
Review comment:
Love to not add more methods here. can we get `makeNewCommitTime` work
as-is? cc @xushiyan
##########
File path:
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestTable.java
##########
@@ -143,6 +143,10 @@ public static String makeNewCommitTime(int sequence) {
return String.format("%09d", sequence);
}
+ public static String makeNewCommitTimeInHudiFormat(int sequence) {
Review comment:
In any case : rename `makeNewCommitTime(int ..)` ?
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
##########
@@ -58,8 +59,34 @@ public CleanPlanActionExecutor(HoodieEngineContext context,
this.extraMetadata = extraMetadata;
}
- protected Option<HoodieCleanerPlan> createCleanerPlan() {
- return execute();
+ private int getCommitInfo() {
+ Option<HoodieInstant> lastCleanInstant =
table.getActiveTimeline().getCleanerTimeline().filterCompletedInstants().lastInstant();
+ HoodieTimeline commitTimeline =
table.getActiveTimeline().getCommitTimeline().filterCompletedInstants();
Review comment:
do you want `getCommitsTimeline()`? instead? please check again what
actions are actually filtered across COW and MOR
##########
File path:
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestCleaner.java
##########
@@ -1148,7 +1167,9 @@ public void testKeepLatestCommits(boolean
simulateFailureRetry, boolean enableIn
.withIncrementalCleaningMode(enableIncrementalClean)
.withFailedWritesCleaningPolicy(HoodieFailedWritesCleaningPolicy.EAGER)
.withCleanBootstrapBaseFileEnabled(enableBootstrapSourceClean)
-
.withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS).retainCommits(2).build())
+ .withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS)
+ .retainCommits(2)
Review comment:
instead of overloading this test, can we write a new one on the
`TestCleanPlanExecutor` level? We are trying to discourage overloading
functional tests like these for functionality that can be tested more easily
(often times, its not possible). but in this case, I think it is.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
##########
@@ -546,11 +568,6 @@ public Builder compactionSmallFileSize(long
smallFileLimitBytes) {
return this;
}
- public Builder compactionRecordSizeEstimateThreshold(double threshold) {
Review comment:
trying to understand why these are removed. There may be clients using
this outside this repo
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java
##########
@@ -58,8 +59,34 @@ public CleanPlanActionExecutor(HoodieEngineContext context,
this.extraMetadata = extraMetadata;
}
- protected Option<HoodieCleanerPlan> createCleanerPlan() {
- return execute();
+ private int getCommitInfo() {
+ Option<HoodieInstant> lastCleanInstant =
table.getActiveTimeline().getCleanerTimeline().filterCompletedInstants().lastInstant();
+ HoodieTimeline commitTimeline =
table.getActiveTimeline().getCommitTimeline().filterCompletedInstants();
+
+ String latestCleanTs;
+ int commitsSinceLastCleaning = 0;
+ if (lastCleanInstant.isPresent()) {
+ latestCleanTs = lastCleanInstant.get().getTimestamp();
+ commitsSinceLastCleaning =
commitTimeline.findInstantsAfter(latestCleanTs).countInstants();
+ } else {
+ String firstCommitTs =
commitTimeline.firstInstant().get().getTimestamp();
+ commitsSinceLastCleaning =
commitTimeline.findInstantsAfterOrEquals(firstCommitTs,
Integer.MAX_VALUE).countInstants();
+ }
+
+ return commitsSinceLastCleaning;
+ }
+
+ private boolean needCleaning(CleaningTriggerStrategy strategy) {
+ boolean cleaningNeeded;
+ if (strategy == CleaningTriggerStrategy.NUM_COMMITS) {
+ int numberOfCommits = getCommitInfo();
+ int maxInlineCommitsForNextClean = config.getInlineCleaningMaxCommits();
+ cleaningNeeded = numberOfCommits >= maxInlineCommitsForNextClean;
Review comment:
can we just return out from here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]