[
https://issues.apache.org/jira/browse/HADOOP-16465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17083426#comment-17083426
]
Hudson commented on HADOOP-16465:
---------------------------------
SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18142 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/18142/])
HADOOP-16465 listLocatedStatus() optimisation (#1943) (github: rev
7b2d84d19ce26a030da3a5dd674f763c95b310d9)
* (edit)
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFileOperationCost.java
* (edit)
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> Tune S3AFileSystem.listLocatedStatus
> ------------------------------------
>
> Key: HADOOP-16465
> URL: https://issues.apache.org/jira/browse/HADOOP-16465
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.2.0
> Reporter: Steve Loughran
> Assignee: Mukund Thakur
> Priority: Major
> Fix For: 3.4.0
>
>
> Looking at logs of LocatedFileStatus/FileInputFormat scans; there's a
> needless call to getFileStatus whenever a S3AFileSystem.listLocatedStatus()
> call is made
> # {{S3AFileSystem.listLocatedStatus()}} does a getFileStatus call, returns
> the file status first
> # But if you look at all the uses in the MR code in FileInputFormat and
> LocatedFileStatusFetcher, they only call this method *knowing the destination
> is a directory*
> Which means for every unguarded S3 path: two needless HEADS and a single
> entry LIST, before the real LIST is initiated.
> If the S3A FS can assume that a dest is a non-empty directory, then it can go
> straight to the LIST operation, only falling back to the HEAD + HEAD +/ if
> that fails.
> We could also think about doing the same for listStatus
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]