zxy created HUDI-4515:
-------------------------
Summary: savepoints will be clean in keeping latest versions policy
Key: HUDI-4515
URL: https://issues.apache.org/jira/browse/HUDI-4515
Project: Apache Hudi
Issue Type: Bug
Components: cleaning
Affects Versions: 0.11.1
Reporter: zxy
Attachments: image-2022-08-01-16-48-16-901.png
When I tested the behavior of clean and savepoint, I found that when clean is
keeping latest versions, the files of savepoint will be deleted. By reading the
code, I found that this should be a bug
here is
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
getFilesToCleanKeepingLatestVersions
!image-2022-08-01-16-48-16-901.png|width=572,height=446!
if can be fixed by this:
{code:java}
while (fileSliceIterator.hasNext() && keepVersions > 0) {
// Skip this most recent version
fileSliceIterator.next();
keepVersions--;
}
// Delete the remaining files
while (fileSliceIterator.hasNext()) {
FileSlice nextSlice = fileSliceIterator.next();
Option<HoodieBaseFile> dataFile = nextSlice.getBaseFile();
if (dataFile.isPresent() &&
savepointedFiles.contains(dataFile.get().getFileName())) {
// do not clean up a savepoint data file
continue;
}
deletePaths.addAll(getCleanFileInfoForSlice(nextSlice));
}{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)