nsivabalan commented on issue #6900:
URL: https://github.com/apache/hudi/issues/6900#issuecomment-1284357523

   Another suggestion is. if you feel having cleaner inline is causing some 
perf hit, you can relax cleaner to run only once in every N commits, using 
`hoodie.clean.max.commits`. What this config means is, even to attempt whether 
something needs to be cleaned, will happen once every N where 
hoodie.clean.max.commits=N. 
   
   
   Do not confuse this w/ `hoodie.cleaner.commits.retained`. Let say you se 
hoodie.cleaner.commits.retained = 10, but hoodie.clean.max.commits=2.
   
   Every 2 commits, hudi cleaner will check if there are more than 10 commits 
in active timeline and clean the data files. IF you are ok to give some leeway, 
you can increase the value for hoodie.clean.max.commits to 5 or 10. So, only 
once every 5 commits even clean scheduling will be attempted. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to