hussein-awala commented on PR #6498:
URL: https://github.com/apache/hudi/pull/6498#issuecomment-1301473361

   I was working on the same idea before finding this PR. There is a case not 
taken into account in this PR.
   
   Let's assume we have these configs:
   - the cleaning policy is `KEEP_LATEST_FILE_VERSIONS`
   - the incremental cleaning is activated
   - and the `cleaner.fileversions.retained` is X
   
   if in the next clean we use a `cleaner.fileversions.retained`  >= X, 
everything is fine, we just need to check the changed partitions since the last 
clean and clean them. But if in the next clean we change the 
`cleaner.fileversions.retained`  to any value < X, we will not be able to use 
the incremental cleaning, and we will need to run the method 
`getPartitionPathsForFullCleaning` to recheck all the partitions.
   
   So to decide if we can use the incremental cleaning or we need to run a 
brute force check, we need to save the  `cleaner.fileversions.retained` value 
used in the last clean in the clean commit avro file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to