hudi-bot opened a new issue, #15925:
URL: https://github.com/apache/hudi/issues/15925

   When cleaner is based on hours, we estimate the earliest commit to retain 
based on current time zone and not UTC or the timezone used to generate the 
commit time. so, there could be some mis-calculations and lead to deleting 
additional slices. 
   
    
   
   Ref: 
[https://github.com/apache/hudi/blob/c6760772f8dc62eb44c45b022ed07858d895d804/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java#L511]
   
    
   {code:java}
   else if (config.getCleanerPolicy() == 
HoodieCleaningPolicy.KEEP_LATEST_BY_HOURS) {
     Instant instant = Instant.now();
     ZonedDateTime currentDateTime = ZonedDateTime.ofInstant(instant, 
ZoneId.systemDefault());
     String earliestTimeToRetain = 
HoodieActiveTimeline.formatDate(Date.from(currentDateTime.minusHours(hoursRetained).toInstant()));
     earliestCommitToRetain = 
Option.fromJavaOptional(commitTimeline.getInstantsAsStream().filter(i -> 
HoodieTimeline.compareTimestamps(i.getTimestamp(),
             HoodieTimeline.GREATER_THAN_OR_EQUALS, 
earliestTimeToRetain)).findFirst());
   } {code}
    
   
    
   
   Potential fixes:
   
   - Fix the time based on time zone set in table config. 
   
   - Fetch the latest completed commit and decide the earliest commit based on 
that.
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-6155
   - Type: Bug
   - Fix version(s):
     - 1.1.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to