[ 
https://issues.apache.org/jira/browse/HADOOP-19072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839279#comment-17839279
 ] 

ASF GitHub Bot commented on HADOOP-19072:
-----------------------------------------

virajjasani commented on code in PR #6543:
URL: https://github.com/apache/hadoop/pull/6543#discussion_r1573386499


##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/MkdirOperation.java:
##########
@@ -124,7 +138,32 @@ public Boolean execute() throws IOException {
       return true;
     }
 
-    // Walk path to root, ensuring closest ancestor is a directory, not file
+    // if performance creation mode is set, no need to check
+    // whether the closest ancestor is dir.
+    if (!performanceCreation) {
+      verifyFileStatusOfClosestAncestor();
+    }
+
+    // if we get here there is no directory at the destination.
+    // so create one.
+
+    // Create the marker file, delete the parent entries
+    // if the filesystem isn't configured to retain them
+    callbacks.createFakeDirectory(dir, false);
+    return true;
+  }
+
+  /**
+   * Verify the file status of the closest ancestor, if it is
+   * dir, the mkdir operation should proceed. If it is file,
+   * the mkdir operation should throw error.
+   *
+   * @throws IOException If either file status could not be retrieved,
+   * or if the closest ancestor is a file.
+   */
+  private void verifyFileStatusOfClosestAncestor() throws IOException {
+    FileStatus fileStatus;
+    // Walk path to root, ensuring the closest ancestor is a directory, not 
file
     Path fPart = dir.getParent();
     try {
       while (fPart != null && !fPart.isRoot()) {

Review Comment:
   I believe that's what we are doing, we are walking the path from the given 
dir to it's parent, if the parent is directory, we are good, if the parent is 
file, we throw error. but if the parent fileStatus is null, we get the parent 
of parent and continue.
   
   So if we are trying to create a/b/c/d/, but if a/b/c does not exist and a/b 
does exist, then we check whether a/b is file or dir. Hence, the while loop 
here first goes to a/b/c and then finds out that file status is null so it goes 
one level up a/b and finds the file status so it makes the decision based on 
whether a/b is file or dir.





> S3A: expand optimisations on stores with "fs.s3a.create.performance"
> --------------------------------------------------------------------
>
>                 Key: HADOOP-19072
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19072
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.0
>            Reporter: Steve Loughran
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>
> on an s3a store with fs.s3a.create.performance set, speed up other operations
> *  mkdir to skip parent directory check: just do a HEAD to see if there's a 
> file at the target location



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to