steveloughran commented on a change in pull request #1407: HADOOP-16490. 
Improve S3Guard handling of FNFEs in copy
URL: https://github.com/apache/hadoop/pull/1407#discussion_r322709982
 
 

 ##########
 File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
 ##########
 @@ -2587,6 +2594,30 @@ S3AFileStatus innerGetFileStatus(final Path f,
     entryPoint(INVOCATION_GET_FILE_STATUS);
     checkNotClosed();
     final Path path = qualify(f);
+    return resolveFileStatus(path, needEmptyDirectoryFlag, false);
+  }
+
+
+  /**
+   * Get the status of a file or directory, first through S3Guard and then
+   * through S3.
+   * The S3 probes can leave 404 responses in the S3 load balancers; if
+   * a check is only needed for a directory, declaring this saves time and
+   * avoids creating one for the object.
+   * When only probing for directories, if an entry for a file is found in
+   * S3Guard it is returned, but checks for updated values are skipped.
+   * @param path fully qualified path
+   * @param needEmptyDirectoryFlag if true, implementation will calculate
+   *        a TRUE or FALSE value for {@link S3AFileStatus#isEmptyDirectory()}
+   * @param onlyProbeForDirectory only perform the directory probes.
+   * @return a S3AFileStatus object
+   * @throws FileNotFoundException when the path does not exist
+   * @throws IOException on other problems.
+   */
+  private S3AFileStatus resolveFileStatus(final Path path,
 
 Review comment:
   I have plans here. Specifically I want to push it down a layer in any 
refactored s3a FS. InnerGetFileStatus is essentially the obsolete one.
   
   Once this PR is I want to switch those operations to list paths *and whose 
callers expect to see directory lists of some kind* to do the dir list first, 
probably in the order: LIST, HEAD / HEAD (HADOOP-16465) and massively eliminate 
the #of calls made during treewalks in query planning. So a we need new args.
   
    Let me review this; I could make that StatusProbeEnum an explicit param to 
resolveFileStatus or innerGetFileStatus() after moving the entry 
point/qualification code up there to getFileStatus.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to