Steve Loughran created HADOOP-16465: ---------------------------------------
Summary: S3AFileSystem.listLocatedStatus to LIST before HEAD Key: HADOOP-16465 URL: https://issues.apache.org/jira/browse/HADOOP-16465 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.2.0 Reporter: Steve Loughran Looking at logs of LocatedFileStatus/FileInputFormat scans; there's a needless call to getFileStatus whenever a S3AFileSystem.listLocatedStatus() call is made # {{S3AFileSystem.listLocatedStatus()}} does a getFileStatus call, returns the file status first # But if you look at all the uses in the MR code in FileInputFormat and LocatedFileStatusFetcher, they only call this method *knowing the destination is a directory* Which means for every unguarded S3 path: two needless HEADS and a single entry LIST, before the real LIST is initiated. If the S3A FS can assume that a dest is a non-empty directory, then it can go straight to the LIST operation, only falling back to the HEAD + HEAD +/ if that fails. We could also think about doing the same for listStatus -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org