[ 
https://issues.apache.org/jira/browse/HUDI-8077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883124#comment-17883124
 ] 

Ethan Guo commented on HUDI-8077:
---------------------------------

Based on the discussion before, "HoodieCleanMetadata" stores the start/instant 
time in earliestCommitToRetain and lastCompletedCommitTimestamp fields.  They 
should now be changed to store completion time and all relevant logic should 
use completion time for getting the list of instants and files to clean so we 
do not miss cleaning concurrent instants if there is any.

> Fix the incremental cleaning to base on completion time
> -------------------------------------------------------
>
>                 Key: HUDI-8077
>                 URL: https://issues.apache.org/jira/browse/HUDI-8077
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Danny Chen
>            Assignee: Lin Liu
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> Currently, the incremental cleaning will remember a marker instant of last 
> retained in the commit metadata, both the marker and the filtering instant on 
> fs view are start times(instant times), while this is okay for most of the 
> cases because we actually have some buffer time for cleaning(30 commits 
> retained by default), but if the user sets up a very redical strategy like 
> clean for every commits, then there might be isses in NB-CC mode:
> A instant that starts very early but finished recently might be skipped by 
> the cleaning table service.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to