hussein-awala commented on PR #7041:
URL: https://github.com/apache/hudi/pull/7041#issuecomment-1291019081

   I tested the PR in our project, it works fine as expected. For each clean we 
have the 3 states requested, inflight and completed, and the clean planner 
checks only the partitions that have been modified since 
`earliestCommitToRetain`.
   
   Recently, we incremented `CLEAN_MAX_COMMITS` to 24 as @nsivabalan 
[proposed](https://github.com/apache/hudi/issues/6953#issuecomment-1283143573) 
in order to clean the tables every 24 hours (we have a commit per hour) and 
avoid listing S3 partitions in the tables with with infrequently changed 
partitions, but the config doesn't work as expected, because after 24 commits, 
if the list of files to delete is empty, the cleaner will be executed at each 
next commit until delete something, because for the clean planner, the last 
clean was when the were some files to delete, and all the next clean operations 
are not considered because they write nothing to the timeline.
   
   In brief, we need this patch ASAP, can you please add it to 0.13.0?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to