TheR1sing3un commented on code in PR #13606:
URL: https://github.com/apache/hudi/pull/13606#discussion_r2228231968


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/versioning/v2/TimelineArchiverV2.java:
##########
@@ -256,6 +260,25 @@ private List<HoodieInstant> getCommitInstantsToArchive() 
throws IOException {
       earliestInstantToRetainCandidates.add(qualifiedEarliestInstant);
     }
 
+    // 6. If archival should consider `earliest retain instant` in the clean 
plan,
+    // we should add the earliest retain instant from the clean plan to the 
candidates.
+    if (config.shouldArchiveKeepCleanPlanRetainInstant()) {

Review Comment:
   > > It is not equivalent to directly using `retain instant` as a candidate.
   > 
   > For example:
   > 
   > ts_0: delta_commit ts_1: delta_commit ts_2: replace commit ts_3: clean, 
`retain instant`: ts_1
   > 
   > now archive:
   > 
   > `ClusteringUtils.getEarliestInstantToRetainForClustering` will return 
`ts_2` as archival retain candidate.
   > 
   > But I want to regard `ts_1` as archival retain candidate, because if the 
archiver archived `ts_1`, the next cleaning will fallback to full table scan.
   
   In this example, `replace commit` exists. The core issue is whether we 
consider retaining the archive to this `retain  in clean plan` rather than 
finding this  instant that is greater than or equal to this `retain in clean 
plan`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to