suryaprasanna commented on code in PR #8783:
URL: https://github.com/apache/hudi/pull/8783#discussion_r1225522025


##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java:
##########
@@ -1416,6 +1418,91 @@ public void 
testArchivalWithMaxDeltaCommitsGuaranteeForCompaction(boolean enable
     }
   }
 
+  @Test

Review Comment:
   Added.



##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java:
##########
@@ -1416,6 +1418,91 @@ public void 
testArchivalWithMaxDeltaCommitsGuaranteeForCompaction(boolean enable
     }
   }
 
+  @Test
+  public void testGetCommitInstantsToArchiveDuringInflightCommits() throws 
Exception {
+    HoodieWriteConfig cfg = initTestTableAndGetWriteConfig(true, 4, 5, 5);
+    HoodieTestDataGenerator.createCommitFile(basePath, "100", 
wrapperFs.getConf());
+
+    HoodieTestDataGenerator.createReplaceCommitRequestedFile(basePath, "101", 
wrapperFs.getConf());
+    HoodieTestDataGenerator.createReplaceCommitInflightFile(basePath, "101", 
wrapperFs.getConf());
+    HoodieTestDataGenerator.createCommitFile(basePath, "102", 
wrapperFs.getConf());
+    HoodieTestDataGenerator.createCommitFile(basePath, "103", 
wrapperFs.getConf());
+    HoodieTestDataGenerator.createCompactionRequestedFile(basePath, "104", 
wrapperFs.getConf());
+    HoodieTestDataGenerator.createCompactionAuxiliaryMetadata(basePath,
+        new HoodieInstant(State.REQUESTED, HoodieTimeline.COMPACTION_ACTION, 
"104"), wrapperFs.getConf());
+    HoodieTestDataGenerator.createCommitFile(basePath, "105", 
wrapperFs.getConf());
+    HoodieTestDataGenerator.createCommitFile(basePath, "106", 
wrapperFs.getConf());
+    HoodieTestDataGenerator.createCommitFile(basePath, "107", 
wrapperFs.getConf());
+    HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient);
+    HoodieTimelineArchiver archiver = new HoodieTimelineArchiver(cfg, table);
+
+    HoodieTimeline timeline = 
metaClient.getActiveTimeline().getWriteTimeline();//getCommitsAndCompactionTimeline();
+    assertEquals(8, timeline.countInstants(), "Loaded 8 commits and the count 
should match");
+    boolean result = archiver.archiveIfRequired(context);
+    assertTrue(result);
+    timeline = 
metaClient.getActiveTimeline().reload().getWriteTimeline();//getCommitsAndCompactionTimeline();
+    assertTrue(timeline.containsInstant(new HoodieInstant(false, 
HoodieTimeline.COMMIT_ACTION, "100")),
+        "At least one instant before oldest pending replacecommit need to stay 
in the timeline");
+    assertEquals(8, timeline.countInstants(),
+        "Since we have a pending replacecommit at 101, we should never archive 
any commit "
+            + "after 101 and also to maintain timeline have at least one 
completed commit before pending commit");
+    assertTrue(timeline.containsInstant(new HoodieInstant(State.INFLIGHT, 
HoodieTimeline.REPLACE_COMMIT_ACTION, "101")),
+        "Inflight replacecommit must still be present");
+    assertTrue(timeline.containsInstant(new HoodieInstant(false, 
HoodieTimeline.COMMIT_ACTION, "102")),

Review Comment:
   Sure, refactored the code.



##########
hudi-common/src/main/java/org/apache/hudi/common/util/ClusteringUtils.java:
##########
@@ -256,7 +256,12 @@ public static Option<HoodieInstant> 
getOldestInstantToRetainForClustering(
           retainLowerBound = earliestInstantToRetain.getTimestamp();
         } else {
           // no earliestInstantToRetain, indicate KEEP_LATEST_FILE_VERSIONS 
clean policy,
-          // retain first instant after clean instant
+          // retain first instant after clean instant.
+          // For KEEP_LATEST_FILE_VERSIONS cleaner policy, file versions are 
only maintained for active file groups
+          // not for replaced file groups. So, last clean instant can be 
considered as a lower bound, since
+          // the cleaner would have removed all the file groups until then. 
But there is a catch to this logic,
+          // while cleaner is running if there is a pending replacecommit then 
those files are not cleaned.
+          // TODO: This case has to be handled.

Review Comment:
   Created couple of tickets please review them.
   https://issues.apache.org/jira/browse/HUDI-6351
   https://issues.apache.org/jira/browse/HUDI-6352
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to