hudi-bot opened a new issue, #16014: URL: https://github.com/apache/hudi/issues/16014
In CleanPlanner, KEEP_LATEST_BY_HOURS is setting earliestCommitToRetain value by consider timestamp directly, this will introduce bug if there are out of order commits where commit with lower timestamp is completed much later than commits with higher timestamps. This policy's implementation needs to be revisit. It should basically store the timestamp until which it cleaned let this be t1. Next cleaner instant should consider all the partitions and files that are modified from the point of t1 until (currentime-x) hours. Whichever files are not valid they should be removed. ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-6352 - Type: Bug -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
