nsivabalan commented on issue #6900: URL: https://github.com/apache/hudi/issues/6900#issuecomment-1284357523
Another suggestion is. if you feel having cleaner inline is causing some perf hit, you can relax cleaner to run only once in every N commits, using `hoodie.clean.max.commits`. What this config means is, even to attempt whether something needs to be cleaned, will happen once every N where hoodie.clean.max.commits=N. Do not confuse this w/ `hoodie.cleaner.commits.retained`. Let say you se hoodie.cleaner.commits.retained = 10, but hoodie.clean.max.commits=2. Every 2 commits, hudi cleaner will check if there are more than 10 commits in active timeline and clean the data files. IF you are ok to give some leeway, you can increase the value for hoodie.clean.max.commits to 5 or 10. So, only once every 5 commits even clean scheduling will be attempted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
