[ 
https://issues.apache.org/jira/browse/HUDI-8077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884156#comment-17884156
 ] 

Y Ethan Guo commented on HUDI-8077:
-----------------------------------

After discussion with [~linliu] and [~danny0405] and going through the code, 
while it would be good to store completion time for consistency, we found that 
there is no necessity for the change, and the earliest commit to retain and the 
last completed commit in the clean metadata can still use the start/instant 
time for cleaning without affecting the correctness.

[~linliu] We should still have a follow-up task to document this clearly, e.g., 
Cleaning, and Tech Spec pages, so that users know the start/instant time is 
still used in the clean metadata, if they want to use the timestamp for 
debugging.

> Fix the incremental cleaning to base on completion time
> -------------------------------------------------------
>
>                 Key: HUDI-8077
>                 URL: https://issues.apache.org/jira/browse/HUDI-8077
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Danny Chen
>            Assignee: Lin Liu
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> Currently, the incremental cleaning will remember a marker instant of last 
> retained in the commit metadata, both the marker and the filtering instant on 
> fs view are start times(instant times), while this is okay for most of the 
> cases because we actually have some buffer time for cleaning(30 commits 
> retained by default), but if the user sets up a very redical strategy like 
> clean for every commits, then there might be isses in NB-CC mode:
> A instant that starts very early but finished recently might be skipped by 
> the cleaning table service.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to