TheR1sing3un commented on code in PR #13606:
URL: https://github.com/apache/hudi/pull/13606#discussion_r2228522065


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/versioning/v2/TimelineArchiverV2.java:
##########
@@ -256,6 +260,25 @@ private List<HoodieInstant> getCommitInstantsToArchive() 
throws IOException {
       earliestInstantToRetainCandidates.add(qualifiedEarliestInstant);
     }
 
+    // 6. If archival should consider `earliest retain instant` in the clean 
plan,
+    // we should add the earliest retain instant from the clean plan to the 
candidates.
+    if (config.shouldArchiveKeepCleanPlanRetainInstant()) {

Review Comment:
   > `retain in clean plan` is just a marker for left boundary of the 
incremental cleaning. It is not the constraint for archving for general, this 
variable is mainly introduced to resolve clustering data duplication issues as 
in the doc: "the clustering instant won't be archived before cleaned, and the 
earliest inflight clustering instant has a previous commit"
   
   Yes, you're right. All I want to do is to, on this basis, reserve a few more 
instances in active timeline to avoid the incremental partition scan fallback 
to a full table scan during cleaning. This does not seem to disrupt any of the 
original logic; it merely optimizes the performance of cleaning on the basis of 
compatibility with the original logic



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to