sivabalan narayanan created HUDI-7104:
-----------------------------------------
Summary: Cleaner could miss to clean up some files w/ savepoint
interplay
Key: HUDI-7104
URL: https://issues.apache.org/jira/browse/HUDI-7104
Project: Apache Hudi
Issue Type: Improvement
Components: cleaning
Reporter: sivabalan narayanan
Lets say partitioning is day based and is based on created date. So, older
partitions generally does not get any new data after few days.
Lets say we have savepoints added to a day and later removed.
day 1: cleaned up.
day2: savepoint added. and so cleaner ignord.
day3: cleaned up
day4: earliest commit to retain based on cleaner configs.
So, w/ this table/timeline state, if we remove the savepointed commit, data
pertaining to day2 will never be cleaned by the cleaner since its lesser than
the earliest commit to retain.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)