[ 
https://issues.apache.org/jira/browse/HUDI-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-7226.
----------------------------
    Fix Version/s: 0.14.1
                       (was: 1.1.0)
         Assignee: Timothy Brown  (was: Raymond Xu)
       Resolution: Fixed

fixed in [https://github.com/apache/hudi/pull/10307]

> Clean by hour does not respect lastVersionBeforeEarliestCommitToRetain
> ----------------------------------------------------------------------
>
>                 Key: HUDI-7226
>                 URL: https://issues.apache.org/jira/browse/HUDI-7226
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: cleaning
>            Reporter: Raymond Xu
>            Assignee: Timothy Brown
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.14.1
>
>
> org.apache.hudi.table.action.clean.CleanPlanner#getFilesToCleanKeepingLatestCommits(java.lang.String,
>  int, org.apache.hudi.common.model.HoodieCleaningPolicy)
> lastVersionBeforeEarliestCommitToRetain is not honored by 
> KEEP_LATEST_BY_HOURS policy. This essentially makes cleaner to remove the 
> file slice when it becomes non-latest. This could fail long-running queries 
> in a race condition:
> # timeline contains a t0.deltacommit (not cleaned because it's latest)
> # a snapshot query starts and running
> # compaction runs and creates t1.commit
> # cleaner runs and remove t0 (because now t1.commit is the latest)
> # the query failed due to a log file belongs to t0.deltacommit is not found



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to