[
https://issues.apache.org/jira/browse/HUDI-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-6352:
---------------------------------
Labels: pull-request-available (was: )
> KEEP_LATEST_BY_HOURS should consider modified time instead of commit time
> while setting earliestCommitToRetain value
> --------------------------------------------------------------------------------------------------------------------
>
> Key: HUDI-6352
> URL: https://issues.apache.org/jira/browse/HUDI-6352
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Surya Prasanna Yalla
> Priority: Major
> Labels: pull-request-available
>
> In CleanPlanner, KEEP_LATEST_BY_HOURS is setting earliestCommitToRetain value
> by consider timestamp directly, this will introduce bug if there are out of
> order commits where commit with lower timestamp is completed much later than
> commits with higher timestamps.
> This policy's implementation needs to be revisit.
> It should basically store the timestamp until which it cleaned let this be
> t1. Next cleaner instant should consider all the partitions and files that
> are modified from the point of t1 until (currentime-x) hours. Whichever files
> are not valid they should be removed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)