hussein-awala commented on PR #6498: URL: https://github.com/apache/hudi/pull/6498#issuecomment-1301473361
I was working on the same idea before finding this PR. There is a case not taken into account in this PR. Let's assume we have these configs: - the cleaning policy is `KEEP_LATEST_FILE_VERSIONS` - the incremental cleaning is activated - and the `cleaner.fileversions.retained` is X if in the next clean we use a `cleaner.fileversions.retained` >= X, everything is fine, we just need to check the changed partitions since the last clean and clean them. But if in the next clean we change the `cleaner.fileversions.retained` to any value < X, we will not be able to use the incremental cleaning, and we will need to run the method `getPartitionPathsForFullCleaning` to recheck all the partitions. So to decide if we can use the incremental cleaning or we need to run a brute force check, we need to save the `cleaner.fileversions.retained` value used in the last clean in the clean commit avro file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
