[
https://issues.apache.org/jira/browse/HADOOP-13712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162733#comment-16162733
]
Steve Loughran commented on HADOOP-13712:
-----------------------------------------
We're not going to add any special APIs for opening files in S3a that end up
needing maintenance and an expectation that it won't get deleted. So -1 to
that. But, with the {{createFile()}} builder API, there's always the ability to
provide hints when a file is opened.
Two hints to consider here are (a) length, and (b) what the initial read pos is
going to be
we could also consider a "lazy-check" option which skips the existence check
until any initial seek
with s3guard around, cost of getFilestatus is lower so I'm less worried the
cost of that initial HEAD, now I'm more worried about complexity of the
codebase.
But at the same time; interesting to consider what could be done to speedup
unguarded stores
(also I've been thinking about whether alongside HADOOP-13282 we should
collect/use the etag of a file from first open (or at least, first seek()) to
detect and react to file updates: we could identify when a file changed & fail.
S3guard doesn't currently track those etags though)
Anyway, that's not a conclusive answer except for a "-1 to any new public API".
Have a look at the new builder API for file opening, and see if you can see a
way to do it there and we can think about it
> S3A open to avoid needless HEAD on the successful execution path
> ----------------------------------------------------------------
>
> Key: HADOOP-13712
> URL: https://issues.apache.org/jira/browse/HADOOP-13712
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.7.3
> Reporter: Steve Loughran
>
> S3A's open() operation does a {{getFileStatus()}} check to see if a file is
> not a directory before opening with a GET. That initial check will take up at
> least one HEAD request if the file is present, more if it isn't.
> As the GET itself performs the existence check, it is needless. A successful
> GET of a path which doesn't end in "/" means a file was there. The only
> reason a getFileStatus call is needed is to choose which error message to
> display if the path isn't there: is it an FNFE or is it path-is-directory.
> Proposed: reorder the code to do the GET; only if that fails fallback to
> getFileStatus()
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]