[
https://issues.apache.org/jira/browse/HUDI-7104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Guo updated HUDI-7104:
----------------------------
Component/s: savepoint
table-service
> Cleaner could miss to clean up some files w/ savepoint interplay
> -----------------------------------------------------------------
>
> Key: HUDI-7104
> URL: https://issues.apache.org/jira/browse/HUDI-7104
> Project: Apache Hudi
> Issue Type: Improvement
> Components: cleaning, savepoint, table-service
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Major
> Labels: pull-request-available
>
> Lets say partitioning is day based and is based on created date. So, older
> partitions generally does not get any new data after few days.
>
> Lets say we have savepoints added to a day and later removed.
> day 1: cleaned up.
> day2: savepoint added. and so cleaner ignord.
> day3: cleaned up
> day4: earliest commit to retain based on cleaner configs.
>
> So, w/ this table/timeline state, if we remove the savepointed commit, data
> pertaining to day2 will never be cleaned by the cleaner since its lesser than
> the earliest commit to retain.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)