[ 
https://issues.apache.org/jira/browse/HUDI-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6352:
---------------------------------
    Labels: pull-request-available  (was: )

> KEEP_LATEST_BY_HOURS should consider modified time instead of commit time 
> while setting earliestCommitToRetain value
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-6352
>                 URL: https://issues.apache.org/jira/browse/HUDI-6352
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Surya Prasanna Yalla
>            Priority: Major
>              Labels: pull-request-available
>
> In CleanPlanner, KEEP_LATEST_BY_HOURS is setting earliestCommitToRetain value 
> by consider timestamp directly, this will introduce bug if there are out of 
> order commits where commit with lower timestamp is completed much later than 
> commits with higher timestamps.
> This policy's implementation needs to be revisit.
> It should basically store the timestamp until which it cleaned let this be 
> t1. Next cleaner instant should consider all the partitions and files that 
> are modified from the point of t1 until (currentime-x) hours. Whichever files 
> are not valid they should be removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to