TheR1sing3un commented on code in PR #13606:
URL: https://github.com/apache/hudi/pull/13606#discussion_r2228231968
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/versioning/v2/TimelineArchiverV2.java:
##########
@@ -256,6 +260,25 @@ private List<HoodieInstant> getCommitInstantsToArchive()
throws IOException {
earliestInstantToRetainCandidates.add(qualifiedEarliestInstant);
}
+ // 6. If archival should consider `earliest retain instant` in the clean
plan,
+ // we should add the earliest retain instant from the clean plan to the
candidates.
+ if (config.shouldArchiveKeepCleanPlanRetainInstant()) {
Review Comment:
> > It is not equivalent to directly using `retain instant` as a candidate.
>
> For example:
>
> ts_0: delta_commit ts_1: delta_commit ts_2: replace commit ts_3: clean,
`retain instant`: ts_1
>
> now archive:
>
> `ClusteringUtils.getEarliestInstantToRetainForClustering` will return
`ts_2` as archival retain candidate.
>
> But I want to regard `ts_1` as archival retain candidate, because if the
archiver archived `ts_1`, the next cleaning will fallback to full table scan.
In this example, `replace commit` exists. The core issue is whether we
consider retaining the archive to this `retain in clean plan` rather than
finding this instant that is greater than or equal to this `retain in clean
plan`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]