[ 
https://issues.apache.org/jira/browse/HUDI-6574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6574:
---------------------------------
    Labels: pull-request-available  (was: )

> Fix the problem that incremental clean cannot be executed when the earliest 
> ActiveTimeline is a pending commit.
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-6574
>                 URL: https://issues.apache.org/jira/browse/HUDI-6574
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Ma Jian
>            Priority: Major
>              Labels: pull-request-available
>
> When performing a clean, the earliest commit to be retained obtained by the 
> getEarliestCommitToRetain method in CleanPlanner is used as the endpoint of 
> the clean. However, when a pending commit takes a long time and all the 
> commits earlier than the pending commit have been achieved, the pending 
> commit becomes the earliest active timeline. In this situation, if 
> getEarliestCommitToRetain is called, it will return empty because there is no 
> earlier commit than the pending commit. During an incremental clean, the 
> previous endpoint, which is the last commit retained in the previous clean, 
> is used as the starting point. However, if this starting point is empty, a 
> full clean will be triggered, which is very resource-intensive.
> To solve this problem without affecting normal clean, I set the 
> EarliestCommitToRetain obtained in this case to the earliest pending commit. 
> Since the endpoint will not be cleaned in the current clean, this approach 
> can solve the aforementioned problem without affecting normal clean.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to