[ 
https://issues.apache.org/jira/browse/HUDI-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-4515:
---------------------------------
    Labels: bug clean pull-request-available  (was: bug clean)

> savepoints will be clean in keeping latest versions policy
> ----------------------------------------------------------
>
>                 Key: HUDI-4515
>                 URL: https://issues.apache.org/jira/browse/HUDI-4515
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: cleaning
>    Affects Versions: 0.11.1
>            Reporter: zxy
>            Priority: Blocker
>              Labels: bug, clean, pull-request-available
>         Attachments: image-2022-08-01-16-48-16-901.png
>
>
> When I tested the behavior of clean and savepoint, I found that when clean is 
> keeping latest versions, the files of savepoint will be deleted. By reading 
> the code, I found that this should be a bug
> here is
> hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
> getFilesToCleanKeepingLatestVersions
> !image-2022-08-01-16-48-16-901.png|width=572,height=446!
> if can be fixed by this:
>  
> {code:java}
> while (fileSliceIterator.hasNext() && keepVersions > 0) {
> // Skip this most recent version
> fileSliceIterator.next();
> keepVersions--;
> }
> // Delete the remaining files
> while (fileSliceIterator.hasNext()) {
> FileSlice nextSlice = fileSliceIterator.next();
> Option<HoodieBaseFile> dataFile = nextSlice.getBaseFile();
> if (dataFile.isPresent() && 
> savepointedFiles.contains(dataFile.get().getFileName())) {
>     // do not clean up a savepoint data file
>     continue;
> }
> deletePaths.addAll(getCleanFileInfoForSlice(nextSlice));
> }{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to