fhan688 commented on code in PR #19041:
URL: https://github.com/apache/hudi/pull/19041#discussion_r3445691683


##########
hudi-common/src/main/java/org/apache/hudi/common/util/CleanerUtils.java:
##########
@@ -151,8 +152,16 @@ public static Option<HoodieInstant> 
getEarliestCommitToRetain(
     } else if (cleaningPolicy == HoodieCleaningPolicy.KEEP_LATEST_BY_HOURS) {
       ZonedDateTime latestDateTime = ZonedDateTime.ofInstant(latestInstant, 
timeZone.getZoneId());
       String earliestTimeToRetain = 
TimelineUtils.formatDate(Date.from(latestDateTime.minusHours(hoursRetained).toInstant()));
-      earliestCommitToRetain = 
Option.fromJavaOptional(completedCommitsTimeline.getInstantsAsStream().filter(i 
-> compareTimestamps(i.requestedTime(),
-          GREATER_THAN_OR_EQUALS, earliestTimeToRetain)).findFirst());
+      earliestCommitToRetain = 
Option.fromJavaOptional(completedCommitsTimeline.getInstantsAsStream()
+          .filter(i -> compareTimestamps(i.requestedTime(), 
GREATER_THAN_OR_EQUALS, earliestTimeToRetain))

Review Comment:
   > This still selects the first instant `>=` the cutoff, but the PR 
description says the fix is to pick the instant _at or before_ the cutoff. That 
change isn't here, so the sparse-commit-density bug described in the issue 
isn't actually addressed — only the new pending-instant guard below is. Please 
reconcile the description with the code (or add the intended change).
   
   Good catch. The code should keep selecting the first completed instant `>=` 
the cutoff. That is the existing clean-by-hours boundary semantics used by 
`CleanPlanner`, where `ECTR` is the first retained instant. Selecting an 
instant at or before the cutoff would move the boundary too far back and breaks 
the existing clean plan expectations.
   
     The PR description was inaccurate here. I will update it to say that this 
patch preserves the `>=` cutoff selection and only adds pending-instant 
protection so the by-hours `ECTR` does not move past the earliest pending 
instant.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to