hbgstc123 opened a new pull request, #8443: URL: https://github.com/apache/hudi/pull/8443
### Change Logs Original `ClusteringUtils::getOldestInstantToRetainForClustering` is based on inflight clean instant, but there maybe a moment when the last clean is complete and the next clean plan not generated, if timeline archive execute at this moment, no replace commit will be retained. This pr propose to decide OldestInstantToRetainForClustering based on latest completed clean instant, return the first replace commit after the `earliestInstantToRetain` of last complete clean or first replace commit after last clean instant if `earliestInstantToRetain` is empty, and return the first replace commit in active timeline if there is no clean instant. ### Impact no ### Risk level (write none, low medium or high below) low ### Documentation Update no ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
