nsivabalan commented on a change in pull request #4016:
URL: https://github.com/apache/hudi/pull/4016#discussion_r776108638



##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanActionExecutor.java
##########
@@ -230,11 +230,15 @@ public HoodieCleanMetadata execute() {
         
.filterInflightsAndRequested().getInstants().collect(Collectors.toList());
     if (pendingCleanInstants.size() > 0) {
       pendingCleanInstants.forEach(hoodieInstant -> {
-        LOG.info("Finishing previously unfinished cleaner instant=" + 
hoodieInstant);
-        try {
-          cleanMetadataList.add(runPendingClean(table, hoodieInstant));
-        } catch (Exception e) {
-          LOG.warn("Failed to perform previous clean operation, instant: " + 
hoodieInstant, e);
+        if (table.getCleanTimeline().isEmpty(hoodieInstant)) {

Review comment:
       not to self: here for every clean, either inflight or requested will be 
returned (if not completed) depending on its current state. so, either we will 
delete the inflight instant only or delete the requested instant. 

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
##########
@@ -146,11 +146,15 @@ public CleanPlanner(HoodieEngineContext context, 
HoodieTable<T, I, K, O> hoodieT
     if (config.incrementalCleanerModeEnabled()) {
       Option<HoodieInstant> lastClean = 
hoodieTable.getCleanTimeline().filterCompletedInstants().lastInstant();
       if (lastClean.isPresent()) {
-        HoodieCleanMetadata cleanMetadata = TimelineMetadataUtils
-            
.deserializeHoodieCleanMetadata(hoodieTable.getActiveTimeline().getInstantDetails(lastClean.get()).get());
-        if ((cleanMetadata.getEarliestCommitToRetain() != null)
-            && (cleanMetadata.getEarliestCommitToRetain().length() > 0)) {
-          return getPartitionPathsForIncrementalCleaning(cleanMetadata, 
instantToRetain);
+        if (hoodieTable.getActiveTimeline().isEmpty(lastClean.get())) {

Review comment:
       Note to self: so here we just delete the completed instant leaving 
requested and inflight as is in timeline. 

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanActionExecutor.java
##########
@@ -230,11 +230,15 @@ public HoodieCleanMetadata execute() {
         
.filterInflightsAndRequested().getInstants().collect(Collectors.toList());
     if (pendingCleanInstants.size() > 0) {
       pendingCleanInstants.forEach(hoodieInstant -> {
-        LOG.info("Finishing previously unfinished cleaner instant=" + 
hoodieInstant);
-        try {
-          cleanMetadataList.add(runPendingClean(table, hoodieInstant));
-        } catch (Exception e) {
-          LOG.warn("Failed to perform previous clean operation, instant: " + 
hoodieInstant, e);
+        if (table.getCleanTimeline().isEmpty(hoodieInstant)) {

Review comment:
       what happens if inflight is empty, but requested is not. I understand we 
will delete the inflight here. but will the instant get re-attempted to be 
clean later (since the requested meta file is intact) ?

##########
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestCleaner.java
##########
@@ -750,6 +750,59 @@ public void testKeepLatestFileVersions(Boolean 
enableBootstrapSourceClean) throw
     assertTrue(testTable.baseFileExists(p0, "00000000000003", file3P0C2));
   }
 
+  @Test
+  public void testCleanEmptyInstants() throws Exception {
+    HoodieWriteConfig config =
+            HoodieWriteConfig.newBuilder()
+                    .withPath(basePath)
+                    
.withMetadataConfig(HoodieMetadataConfig.newBuilder().withAssumeDatePartitioning(true).build())
+                    .withCompactionConfig(HoodieCompactionConfig.newBuilder()
+                            
.withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS).retainFileVersions(1).build())
+                    .build();
+    metaClient = HoodieTableMetaClient.reload(metaClient);
+
+    int commitCount = 20;
+    int cleanCount = 20;
+
+    int startInstant = 1;
+    for (int i = 0; i < commitCount; i++, startInstant++) {
+      String commitTime = makeNewCommitTime(startInstant);
+      HoodieTestTable.of(metaClient).addCommit(commitTime);
+    }
+
+    for (int i = 0; i < cleanCount; i++, startInstant++) {
+      String commitTime = makeNewCommitTime(startInstant);
+      createCleanMetadata(commitTime + "", false, true);
+    }
+
+    List<HoodieCleanStat> cleanStats = runCleaner(config);
+    HoodieActiveTimeline timeline = metaClient.reloadActiveTimeline();
+
+    assertEquals(0, cleanStats.size(), "Must not clean any files");
+    assertEquals(1, timeline.getTimelineOfActions(

Review comment:
       I get it now, thanks a lot. I also ran the test locally and understood 
the behavior. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to