sivabalan narayanan created HUDI-7104:
-----------------------------------------

             Summary: Cleaner could miss to clean up some files w/ savepoint 
interplay 
                 Key: HUDI-7104
                 URL: https://issues.apache.org/jira/browse/HUDI-7104
             Project: Apache Hudi
          Issue Type: Improvement
          Components: cleaning
            Reporter: sivabalan narayanan


Lets say partitioning is day based and is based on created date. So, older 
partitions generally does not get any new data after few days. 

 

Lets say we have savepoints added to a day and later removed. 

day 1: cleaned up. 

day2: savepoint added. and so cleaner ignord. 

day3: cleaned up 

day4: earliest commit to retain based on cleaner configs. 

 

So, w/ this table/timeline state, if we remove the savepointed commit, data 
pertaining to day2 will never be cleaned by the cleaner since its lesser than 
the earliest commit to retain. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to