parisni commented on issue #6373: URL: https://github.com/apache/hudi/issues/6373#issuecomment-1235585957
After thinking, I guess MDT listing files to delete could be much more efficient if batched by partitions. Currently this lead to N call to MDT where N=part number. This is highly innefficient with 100k partitions On August 30, 2022 11:41:19 PM UTC, Sivabalan Narayanan ***@***.***> wrote: >and wrt your statement `Also I guess there is a problem to use incremental cleaning together with KEEP_LATEST_COMMITS which lead to never clean some partitions after a first clean but I will open a separate issue for this one.`, if you happen to create a new issue, let me know. do tag me in there. > >-- >Reply to this email directly or view it on GitHub: >https://github.com/apache/hudi/issues/6373#issuecomment-1232280621 >You are receiving this because you were mentioned. > >Message ID: ***@***.***> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
