Zouxxyy commented on code in PR #8364:
URL: https://github.com/apache/hudi/pull/8364#discussion_r1188671535
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -215,6 +216,21 @@ private List<String> getPartitionPathsForFullCleaning() {
return FSUtils.getAllPartitionPaths(context, config.getMetadataConfig(),
config.getBasePath());
}
+ /**
+ * Verify whether file slice exists in savepointedFiles, check both base
file and log files
+ */
+ private boolean isFsExistInSavepointedFiles(FileSlice fs, List<String>
savepointedFiles) {
+ if (fs.getBaseFile().isPresent() &&
savepointedFiles.contains(fs.getBaseFile().get().getFileName())) {
+ return true;
+ }
Review Comment:
> Should we return true if base file exists while log file is missing?
Currently, when run clean, the file slice is the smallest unit, so as long
as there is one file in fs match, the whole file slice is not deleted
For example:
```text
(t1) fs: base, log1
(t2) run savepoint
(t3) fs: base, log1, log2
...
(tn) run clean
```
The entire fs will be preserved, here I follow the old logic.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]