steveloughran commented on code in PR #2149:
URL: https://github.com/apache/hadoop/pull/2149#discussion_r2251063163


##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -4086,25 +4175,41 @@ public boolean exists(Path f) throws IOException {
   }
 
   /**
-   * Override superclass so as to add statistic collection.
+   * Optimized probe for a path referencing a dir.
+   * Even though it is optimized to a single HEAD, applications
+   * should not over-use this method...it is all too common.
    * {@inheritDoc}
    */
   @Override
   @SuppressWarnings("deprecation")
   public boolean isDirectory(Path f) throws IOException {

Review Comment:
   not good. file a PR, including what you can of the stack of checks. 
   
   What is probably happening is that the method calling this is assuming all 
the paths are directories (which this call is optimised for) but as all the 
paths are files it ends up doing
   
   LIST path
   
   so yes, it would be a step backwards. The code should be calling 
getFileStatus to really get everything about a file.
   
   how, why are yo providing a list of many may files, given that spark expects 
to be working on a directory at a time?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to