fhan688 commented on code in PR #19041:
URL: https://github.com/apache/hudi/pull/19041#discussion_r3445691683
##########
hudi-common/src/main/java/org/apache/hudi/common/util/CleanerUtils.java:
##########
@@ -151,8 +152,16 @@ public static Option<HoodieInstant>
getEarliestCommitToRetain(
} else if (cleaningPolicy == HoodieCleaningPolicy.KEEP_LATEST_BY_HOURS) {
ZonedDateTime latestDateTime = ZonedDateTime.ofInstant(latestInstant,
timeZone.getZoneId());
String earliestTimeToRetain =
TimelineUtils.formatDate(Date.from(latestDateTime.minusHours(hoursRetained).toInstant()));
- earliestCommitToRetain =
Option.fromJavaOptional(completedCommitsTimeline.getInstantsAsStream().filter(i
-> compareTimestamps(i.requestedTime(),
- GREATER_THAN_OR_EQUALS, earliestTimeToRetain)).findFirst());
+ earliestCommitToRetain =
Option.fromJavaOptional(completedCommitsTimeline.getInstantsAsStream()
+ .filter(i -> compareTimestamps(i.requestedTime(),
GREATER_THAN_OR_EQUALS, earliestTimeToRetain))
Review Comment:
> This still selects the first instant `>=` the cutoff, but the PR
description says the fix is to pick the instant _at or before_ the cutoff. That
change isn't here, so the sparse-commit-density bug described in the issue
isn't actually addressed — only the new pending-instant guard below is. Please
reconcile the description with the code (or add the intended change).
Good catch. The code should keep selecting the first completed instant `>=`
the cutoff. That is the existing clean-by-hours boundary semantics used by
`CleanPlanner`, where `ECTR` is the first retained instant. Selecting an
instant at or before the cutoff would move the boundary too far back and breaks
the existing clean plan expectations.
The PR description was inaccurate here. I will update it to say that this
patch preserves the `>=` cutoff selection and only adds pending-instant
protection so the by-hours `ECTR` does not move past the earliest pending
instant.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]