steveloughran commented on code in PR #2149: URL: https://github.com/apache/hadoop/pull/2149#discussion_r2251063163
########## hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java: ########## @@ -4086,25 +4175,41 @@ public boolean exists(Path f) throws IOException { } /** - * Override superclass so as to add statistic collection. + * Optimized probe for a path referencing a dir. + * Even though it is optimized to a single HEAD, applications + * should not over-use this method...it is all too common. * {@inheritDoc} */ @Override @SuppressWarnings("deprecation") public boolean isDirectory(Path f) throws IOException { Review Comment: not good. file a PR, including what you can of the stack of checks. What is probably happening is that the method calling this is assuming all the paths are directories (which this call is optimised for) but as all the paths are files it ends up doing LIST path so yes, it would be a step backwards. The code should be calling getFileStatus to really get everything about a file. how, why are yo providing a list of many may files, given that spark expects to be working on a directory at a time? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org