hbgstc123 opened a new pull request, #8443:
URL: https://github.com/apache/hudi/pull/8443

   ### Change Logs
   
   Original `ClusteringUtils::getOldestInstantToRetainForClustering` is based 
on inflight clean instant, but there maybe a moment when the last clean is 
complete and the next clean plan not generated, if timeline archive execute at 
this moment, no replace commit will be retained.  
   This pr propose to decide OldestInstantToRetainForClustering based on latest 
completed clean instant, return the first replace commit after the 
`earliestInstantToRetain` of last complete clean or first replace commit after 
last clean instant if `earliestInstantToRetain` is empty, and return the first 
replace commit in active timeline if there is no clean instant.
   
   ### Impact
   
   no
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   no
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to