[
https://issues.apache.org/jira/browse/HUDI-6574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-6574:
---------------------------------
Labels: pull-request-available (was: )
> Fix the problem that incremental clean cannot be executed when the earliest
> ActiveTimeline is a pending commit.
> ---------------------------------------------------------------------------------------------------------------
>
> Key: HUDI-6574
> URL: https://issues.apache.org/jira/browse/HUDI-6574
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Ma Jian
> Priority: Major
> Labels: pull-request-available
>
> When performing a clean, the earliest commit to be retained obtained by the
> getEarliestCommitToRetain method in CleanPlanner is used as the endpoint of
> the clean. However, when a pending commit takes a long time and all the
> commits earlier than the pending commit have been achieved, the pending
> commit becomes the earliest active timeline. In this situation, if
> getEarliestCommitToRetain is called, it will return empty because there is no
> earlier commit than the pending commit. During an incremental clean, the
> previous endpoint, which is the last commit retained in the previous clean,
> is used as the starting point. However, if this starting point is empty, a
> full clean will be triggered, which is very resource-intensive.
> To solve this problem without affecting normal clean, I set the
> EarliestCommitToRetain obtained in this case to the earliest pending commit.
> Since the endpoint will not be cleaned in the current clean, this approach
> can solve the aforementioned problem without affecting normal clean.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)