umehrot2 commented on issue #7600:
URL: https://github.com/apache/hudi/issues/7600#issuecomment-1405874402

   @SabyasachiDasTR @koochiswathiTR The issue here is similar to 
https://github.com/apache/hudi/issues/3739 . I believe what is happening here 
is that you are setting CLEANER_HOURS_RETAINED to 2 days. But meanwhile, 
archival is running more aggressively. By default archival will maintain 
maximum 30 commits in the active timeline - 
https://hudi.apache.org/docs/0.11.1/configurations#hoodiekeepmaxcommits. Hence, 
in your case by the time cleaner is run and its trying to clean up commits 
older than 2 days, those commits are already archived. And hence cleaner even 
though it is scheduled, it is not finding anything to clean based on the logs 
you have provided.
   
   If you want to continue with you current cleaner config, you should set 
https://hudi.apache.org/docs/0.11.1/configurations#hoodiekeepmaxcommits to be 
higher than the number of commits you have in a span of 2 days. Essentially, 
you want to cleaner to run at a higher frequency than archival.
   
   As for cleaning the data, you should disable 
https://hudi.apache.org/docs/configurations/#hoodiecleanerincrementalmode while 
running the clean manually. This is needed because in your case, you want to 
cleaner to go back in time and clean dangling files which are older than last 
time the cleaner was run.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to