Surya Prasanna Yalla created HUDI-6352:
------------------------------------------

             Summary: KEEP_LATEST_BY_HOURS should consider modified time 
instead of commit time while setting earliestCommitToRetain value
                 Key: HUDI-6352
                 URL: https://issues.apache.org/jira/browse/HUDI-6352
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Surya Prasanna Yalla


In CleanPlanner, KEEP_LATEST_BY_HOURS is setting earliestCommitToRetain value 
by consider timestamp directly, this will introduce bug if there are out of 
order commits where commit with lower timestamp is completed much later than 
commits with higher timestamps.

This policy's implementation needs to be revisit.

It should basically store the timestamp until which it cleaned let this be t1. 
Next cleaner instant should consider all the partitions and files that are 
modified from the point of t1 until (currentime-x) hours. Whichever files are 
not valid they should be removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to