hudi-bot opened a new issue, #16115:
URL: https://github.com/apache/hudi/issues/16115

   When performing a clean, the earliest commit to be retained obtained by the 
getEarliestCommitToRetain method in CleanPlanner is used as the endpoint of the 
clean. However, when a pending commit takes a long time and all the commits 
earlier than the pending commit have been achieved, the pending commit becomes 
the earliest active timeline. In this situation, if getEarliestCommitToRetain 
is called, it will return empty because there is no earlier commit than the 
pending commit. During an incremental clean, the previous endpoint, which is 
the last commit retained in the previous clean, is used as the starting point. 
However, if this starting point is empty, a full clean will be triggered, which 
is very resource-intensive.
   
   To solve this problem without affecting normal clean, I set the 
EarliestCommitToRetain obtained in this case to the earliest pending commit. 
Since the endpoint will not be cleaned in the current clean, this approach can 
solve the aforementioned problem without affecting normal clean.
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-6574
   - Type: Bug


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to