[ 
https://issues.apache.org/jira/browse/GOBBLIN-1619?focusedWorklogId=752536&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752536
 ]

ASF GitHub Bot logged work on GOBBLIN-1619:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Apr/22 22:37
            Start Date: 04/Apr/22 22:37
    Worklog Time Spent: 10m 
      Work Description: hanghangliu commented on code in PR #3477:
URL: https://github.com/apache/gobblin/pull/3477#discussion_r842190964


##########
gobblin-utility/src/main/java/org/apache/gobblin/util/WriterUtils.java:
##########
@@ -307,10 +303,20 @@ public static void 
mkdirsWithRecursivePermissionWithRetry(final FileSystem fs, f
         throw new IOException("Path " + path + "does not exist however it 
should. Giving up..."+ e);
       }
     }
+  }
+
+  private static void gobblinMkDirs(final FileSystem fs, final Path path, 
FsPermission perm) throws IOException {
+    Set<Path> parentsThatDidntExistBefore = new HashSet<>();
+    for (Path p = path.getParent(); p != null && !fs.exists(p); p = 
p.getParent()) {
+      parentsThatDidntExistBefore.add(p);
+    }
+
+    if (!FileSystem.mkdirs(fs, path, perm)) {
+      throw new IOException(String.format("Unable to mkdir %s with permission 
%s", path, perm));
+    }
 
-    // Double check permission, since fs.mkdirs() may not guarantee to set the 
permission correctly

Review Comment:
   Do you think it's worth keeping the if condition before setting the 
permission?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 752536)
    Time Spent: 2h 20m  (was: 2h 10m)

> WriterUtils.mkdirsWithRecursivePermission contains race condition and puts 
> unnecessary load on filesystem
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: GOBBLIN-1619
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1619
>             Project: Apache Gobblin
>          Issue Type: Bug
>            Reporter: Matthew Ho
>            Priority: Minor
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The current implementation recursively calls fs.mkdirs has the following 
> issues:
>  * *Race condition for creating parent directories, causing FileNotFound 
> exception even when the file exists on file system*
>  * {*}HDFS fs.mkdirs atomically creates missing parent directories. Thus, the 
> recursive approach is making unnecessary calls.{*}{*}{*}
> HDFS, which the current FileSystem interface is built upon, guarantees the 
> parents will be created. So all FileSystem class implementations should also 
> follow this behavior. 
>  
> *Note the 
> [FileSystem|https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html]
>  abstract class documentation says the following:*
> The behaviour of the filesystem is [specified in the Hadoop documentation. 
> |https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html]However,
>  the normative specification of the behavior of this class is actually HDFS: 
> {color:#de350b}if HDFS does not behave the way these Javadocs or the 
> specification in the Hadoop documentations define, assume that the 
> documentation is incorrect{color}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to