zxy created HUDI-4515:
-------------------------

             Summary: savepoints will be clean in keeping latest versions policy
                 Key: HUDI-4515
                 URL: https://issues.apache.org/jira/browse/HUDI-4515
             Project: Apache Hudi
          Issue Type: Bug
          Components: cleaning
    Affects Versions: 0.11.1
            Reporter: zxy
         Attachments: image-2022-08-01-16-48-16-901.png

When I tested the behavior of clean and savepoint, I found that when clean is 
keeping latest versions, the files of savepoint will be deleted. By reading the 
code, I found that this should be a bug

here is

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java

getFilesToCleanKeepingLatestVersions

!image-2022-08-01-16-48-16-901.png|width=572,height=446!

if can be fixed by this:

 
{code:java}
while (fileSliceIterator.hasNext() && keepVersions > 0) {
// Skip this most recent version
fileSliceIterator.next();
keepVersions--;
}
// Delete the remaining files
while (fileSliceIterator.hasNext()) {
FileSlice nextSlice = fileSliceIterator.next();
Option<HoodieBaseFile> dataFile = nextSlice.getBaseFile();
if (dataFile.isPresent() && 
savepointedFiles.contains(dataFile.get().getFileName())) {
    // do not clean up a savepoint data file
    continue;
}
deletePaths.addAll(getCleanFileInfoForSlice(nextSlice));
}{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to