voonhous opened a new issue, #18962:
URL: https://github.com/apache/hudi/issues/18962

   ### Describe the problem
   
   CleanActionExecutor.deleteFileAndGetResult calls storage.getPathInfo on 
every path in the cleaner plan to check isDirectory before deleting it. Every 
file entry in the plan is a base file, log file or bootstrap base file path, 
never a directory, so the check is one wasted RPC per file. On cloud storage 
such as S3 or GCS this doubles the request count and latency of the clean 
execution phase.
   
   ### Proposed fix
   
   Delete plan file entries directly with deleteFile and drop the getPathInfo 
call. Keep the isDirectory handling only for partitionsToBeDeleted entries, 
which can be directories.
   
   Will raise a PR for this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to