voonhous opened a new issue, #18962: URL: https://github.com/apache/hudi/issues/18962
### Describe the problem CleanActionExecutor.deleteFileAndGetResult calls storage.getPathInfo on every path in the cleaner plan to check isDirectory before deleting it. Every file entry in the plan is a base file, log file or bootstrap base file path, never a directory, so the check is one wasted RPC per file. On cloud storage such as S3 or GCS this doubles the request count and latency of the clean execution phase. ### Proposed fix Delete plan file entries directly with deleteFile and drop the getPathInfo call. Keep the isDirectory handling only for partitionsToBeDeleted entries, which can be directories. Will raise a PR for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
