pratyakshsharma commented on code in PR #7041:
URL: https://github.com/apache/hudi/pull/7041#discussion_r1061583585


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -488,13 +497,14 @@ public Option<HoodieInstant> getEarliestCommitToRetain() {
     int hoursRetained = config.getCleanerHoursRetained();
     if (config.getCleanerPolicy() == HoodieCleaningPolicy.KEEP_LATEST_COMMITS
         && commitTimeline.countInstants() > commitsRetained) {
-      earliestCommitToRetain = 
commitTimeline.nthInstant(commitTimeline.countInstants() - commitsRetained); 
//15 instants total, 10 commits to retain, this gives 6th instant in the list
+      earliestCommitToRetain =
+          commitTimeline.nthInstant(commitTimeline.countInstants() - 
commitsRetained); //15 instants total, 10 commits to retain, this gives 6th 
instant in the list- commitsRetained, 0));

Review Comment:
   Can you remove the changes in this comment here? Are they needed?



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java:
##########
@@ -146,10 +179,12 @@ HoodieCleanerPlan requestClean(HoodieEngineContext 
context) {
    */
   protected Option<HoodieCleanerPlan> requestClean(String startCleanTime) {
     final HoodieCleanerPlan cleanerPlan = requestClean(context);
-    if ((cleanerPlan.getFilePathsToBeDeletedPerPartition() != null)
+    // Create a clean request contains the cleaner plan if:
+    // - ALLOW_EMPTY_CLEAN_COMMITS is true
+    // - or the list of the file paths to be deleted is not empty
+    if (config.allowEmptyCleanCommits() || 
(cleanerPlan.getFilePathsToBeDeletedPerPartition() != null

Review Comment:
   we should probably add another check to validate that 
ALLOW_EMPTY_CLEAN_COMMITS is set to true only if incremental cleaning is 
enabled.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java:
##########
@@ -38,10 +39,15 @@
 import org.apache.hudi.exception.HoodieIOException;
 import org.apache.hudi.table.HoodieTable;
 import org.apache.hudi.table.action.BaseActionExecutor;
+

Review Comment:
   nit: remove the extra line.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -187,9 +187,18 @@ private List<String> 
getPartitionPathsForIncrementalCleaning(HoodieCleanMetadata
     LOG.info("Incremental Cleaning mode is enabled. Looking up partition-paths 
that have since changed "
         + "since last cleaned at " + cleanMetadata.getEarliestCommitToRetain()
         + ". New Instant to retain : " + newInstantToRetain);
+    String commitJustBeforeEarliestCommitToRetain = null;

Review Comment:
   Just wondering if these changes in this method are actually needed. Even if 
we keep the original logic, I guess we are not missing out on any scenario. 
Please correct me if I am missing anything here. Ultimately you are only 
getting the lastCheckedCommit and that is already taken care of in the original 
logic.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java:
##########
@@ -86,7 +92,29 @@ private boolean needsCleaning(CleaningTriggerStrategy 
strategy) {
     if (strategy == CleaningTriggerStrategy.NUM_COMMITS) {
       int numberOfCommits = getCommitsSinceLastCleaning();
       int maxInlineCommitsForNextClean = config.getCleaningMaxCommits();
-      return numberOfCommits >= maxInlineCommitsForNextClean;
+      if (numberOfCommits >= maxInlineCommitsForNextClean) {
+        // check if the number of commits created after the last clean is 
greater than clean.max.commits
+        int commitsRetained = config.getCleanerCommitsRetained();
+        int hoursRetained = config.getCleanerHoursRetained();
+        if (config.getCleanerPolicy() == 
HoodieCleaningPolicy.KEEP_LATEST_COMMITS) {
+          // if cleaner policy is KEEP_LATEST_COMMITS then
+          // check if the number of completed commits in the timeline is 
greater than cleaner.commits.retained
+          return table.getCompletedCommitsTimeline().countInstants() > 
commitsRetained;

Review Comment:
   These checks are already present in CleanPlanner here - 
https://github.com/apache/hudi/blob/a0df6ecfb47ac9e393b5eb977850cb8fbe7d3f72/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java#L313.
 
   
   The new changes are redundant and we should remove the checks from any one 
place. I will leave this to you to decide. I feel we should remove the check 
from the place I mentioned since this makes more sense to keep them in this 
method here.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java:
##########
@@ -86,7 +92,29 @@ private boolean needsCleaning(CleaningTriggerStrategy 
strategy) {
     if (strategy == CleaningTriggerStrategy.NUM_COMMITS) {
       int numberOfCommits = getCommitsSinceLastCleaning();
       int maxInlineCommitsForNextClean = config.getCleaningMaxCommits();
-      return numberOfCommits >= maxInlineCommitsForNextClean;
+      if (numberOfCommits >= maxInlineCommitsForNextClean) {
+        // check if the number of commits created after the last clean is 
greater than clean.max.commits
+        int commitsRetained = config.getCleanerCommitsRetained();
+        int hoursRetained = config.getCleanerHoursRetained();
+        if (config.getCleanerPolicy() == 
HoodieCleaningPolicy.KEEP_LATEST_COMMITS) {
+          // if cleaner policy is KEEP_LATEST_COMMITS then
+          // check if the number of completed commits in the timeline is 
greater than cleaner.commits.retained
+          return table.getCompletedCommitsTimeline().countInstants() > 
commitsRetained;
+        } else if (config.getCleanerPolicy() == 
HoodieCleaningPolicy.KEEP_LATEST_BY_HOURS) {
+          // if cleaner policy is KEEP_LATEST_BY_HOURS then
+          // check if there is a commit with timestamp older than current 
instant - cleaner.hours.retained
+          Instant instant = Instant.now();
+          ZonedDateTime currentDateTime = ZonedDateTime.ofInstant(instant, 
ZoneId.systemDefault());
+          String earliestTimeToRetain = 
HoodieActiveTimeline.formatDate(Date.from(currentDateTime.minusHours(hoursRetained).toInstant()));
+          return 
table.getCompletedCommitsTimeline().getInstantsAsStream().filter(i -> 
HoodieTimeline.compareTimestamps(i.getTimestamp(),
+              HoodieTimeline.LESSER_THAN, earliestTimeToRetain)).count() > 0;
+        } else if (config.getCleanerPolicy() == 
HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS) {
+          // if cleaner policy is KEEP_LATEST_BY_HOURS then

Review Comment:
   nit: change the cleaner policy to KEEP_LATEST_FILE_VERSIONS.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java:
##########
@@ -146,10 +179,12 @@ HoodieCleanerPlan requestClean(HoodieEngineContext 
context) {
    */
   protected Option<HoodieCleanerPlan> requestClean(String startCleanTime) {
     final HoodieCleanerPlan cleanerPlan = requestClean(context);
-    if ((cleanerPlan.getFilePathsToBeDeletedPerPartition() != null)
+    // Create a clean request contains the cleaner plan if:

Review Comment:
   nit: Create a clean request -> New clean request



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to