bvaradar commented on issue #2423: URL: https://github.com/apache/hudi/issues/2423#issuecomment-758433327
Hudi does not synchronize on partition path creation. Instead, each executor task which is about to write to a parquet file ensures the directory path exists by issuing fs.mkdirs call. Added : https://issues.apache.org/jira/browse/HUDI-1523 If mkdirs is a costly API, Can you try this patch. It tradesoff mkdirs call with getFileStatus() - `diff --git a/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java b/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java index d148b1b8..11b3cb49 100644 --- a/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java +++ b/hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java @@ -105,7 +105,9 @@ public abstract class HoodieWriteHandle<T extends HoodieRecordPayload> extends H public Path makeNewPath(String partitionPath) { Path path = FSUtils.getPartitionPath(config.getBasePath(), partitionPath); try { - fs.mkdirs(path); // create a new partition as needed. + if (!fs.exists(path)) { + fs.mkdirs(path); // create a new partition as needed. + } } catch (IOException e) { throw new HoodieIOException("Failed to make dir " + path, e); }` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
