bvaradar commented on issue #2423:
URL: https://github.com/apache/hudi/issues/2423#issuecomment-758433327


   Hudi does not synchronize on partition path creation. Instead, each executor 
task which is about to write to a parquet file ensures the directory path 
exists by issuing fs.mkdirs call. Added : 
https://issues.apache.org/jira/browse/HUDI-1523
   
   If mkdirs is a costly API, Can you try this patch. It tradesoff mkdirs call 
with getFileStatus() -
   `diff --git 
a/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java 
b/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
   index d148b1b8..11b3cb49 100644
   --- a/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
   +++ b/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
   @@ -105,7 +105,9 @@ public abstract class HoodieWriteHandle<T extends 
HoodieRecordPayload> extends H
      public Path makeNewPath(String partitionPath) {
        Path path = FSUtils.getPartitionPath(config.getBasePath(), 
partitionPath);
        try {
   -      fs.mkdirs(path); // create a new partition as needed.
   +      if (!fs.exists(path)) {
   +        fs.mkdirs(path); // create a new partition as needed.
   +      }
        } catch (IOException e) {
          throw new HoodieIOException("Failed to make dir " + path, e);
        }`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to