[
https://issues.apache.org/jira/browse/HUDI-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-4515:
---------------------------------
Labels: bug clean pull-request-available (was: bug clean)
> savepoints will be clean in keeping latest versions policy
> ----------------------------------------------------------
>
> Key: HUDI-4515
> URL: https://issues.apache.org/jira/browse/HUDI-4515
> Project: Apache Hudi
> Issue Type: Bug
> Components: cleaning
> Affects Versions: 0.11.1
> Reporter: zxy
> Priority: Blocker
> Labels: bug, clean, pull-request-available
> Attachments: image-2022-08-01-16-48-16-901.png
>
>
> When I tested the behavior of clean and savepoint, I found that when clean is
> keeping latest versions, the files of savepoint will be deleted. By reading
> the code, I found that this should be a bug
> here is
> hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
> getFilesToCleanKeepingLatestVersions
> !image-2022-08-01-16-48-16-901.png|width=572,height=446!
> if can be fixed by this:
>
> {code:java}
> while (fileSliceIterator.hasNext() && keepVersions > 0) {
> // Skip this most recent version
> fileSliceIterator.next();
> keepVersions--;
> }
> // Delete the remaining files
> while (fileSliceIterator.hasNext()) {
> FileSlice nextSlice = fileSliceIterator.next();
> Option<HoodieBaseFile> dataFile = nextSlice.getBaseFile();
> if (dataFile.isPresent() &&
> savepointedFiles.contains(dataFile.get().getFileName())) {
> // do not clean up a savepoint data file
> continue;
> }
> deletePaths.addAll(getCleanFileInfoForSlice(nextSlice));
> }{code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)