[
https://issues.apache.org/jira/browse/HUDI-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raymond Xu closed HUDI-7226.
----------------------------
Fix Version/s: 0.14.1
(was: 1.1.0)
Assignee: Timothy Brown (was: Raymond Xu)
Resolution: Fixed
fixed in [https://github.com/apache/hudi/pull/10307]
> Clean by hour does not respect lastVersionBeforeEarliestCommitToRetain
> ----------------------------------------------------------------------
>
> Key: HUDI-7226
> URL: https://issues.apache.org/jira/browse/HUDI-7226
> Project: Apache Hudi
> Issue Type: Improvement
> Components: cleaning
> Reporter: Raymond Xu
> Assignee: Timothy Brown
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.14.1
>
>
> org.apache.hudi.table.action.clean.CleanPlanner#getFilesToCleanKeepingLatestCommits(java.lang.String,
> int, org.apache.hudi.common.model.HoodieCleaningPolicy)
> lastVersionBeforeEarliestCommitToRetain is not honored by
> KEEP_LATEST_BY_HOURS policy. This essentially makes cleaner to remove the
> file slice when it becomes non-latest. This could fail long-running queries
> in a race condition:
> # timeline contains a t0.deltacommit (not cleaned because it's latest)
> # a snapshot query starts and running
> # compaction runs and creates t1.commit
> # cleaner runs and remove t0 (because now t1.commit is the latest)
> # the query failed due to a log file belongs to t0.deltacommit is not found
--
This message was sent by Atlassian Jira
(v8.20.10#820010)