Surya Prasanna Yalla created HUDI-6352:
------------------------------------------
Summary: KEEP_LATEST_BY_HOURS should consider modified time
instead of commit time while setting earliestCommitToRetain value
Key: HUDI-6352
URL: https://issues.apache.org/jira/browse/HUDI-6352
Project: Apache Hudi
Issue Type: Bug
Reporter: Surya Prasanna Yalla
In CleanPlanner, KEEP_LATEST_BY_HOURS is setting earliestCommitToRetain value
by consider timestamp directly, this will introduce bug if there are out of
order commits where commit with lower timestamp is completed much later than
commits with higher timestamps.
This policy's implementation needs to be revisit.
It should basically store the timestamp until which it cleaned let this be t1.
Next cleaner instant should consider all the partitions and files that are
modified from the point of t1 until (currentime-x) hours. Whichever files are
not valid they should be removed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)