Steve Loughran created HADOOP-13654:
---------------------------------------
Summary: S3A create() to support asynchronous check of dest &
parent paths
Key: HADOOP-13654
URL: https://issues.apache.org/jira/browse/HADOOP-13654
Project: Hadoop Common
Issue Type: Sub-task
Components: fs/s3
Affects Versions: 2.7.3
Reporter: Steve Loughran
One source of delays in S3A is the need to check if a destination path exists
in create; this makes sure the operation isn't trying to overwrite a directory.
#. This is slow, 1-4 HTTPS requests
# The code doesn't seem to check the entire parent path to make sure there
isn't a file as a parent (which raises the question: shouldn't we have a
contract test for this?)
# Even with the create overwrite=false check, the fact that the new object
isn't created until the output stream is close()'d, means that the check has
race conditions.
Instead of doing a synchronous check in create(), we could do an asynchronous
check of the parent directory tree. If any error surfaced, this could be cached
and then thrown on the next call to: write(), flush() or close(); that is, the
failure of a create due to path problems would not surface immediately on the
create() call, *but before any writes were committed*.
The full directory tree can/should be checked, and is results remembered. This
would allow for the post-commit cleanup to issue delete() requests purely for
those paths (if any) which referred to directories.
As well as the need to use the AWS thread pool, there's a bit of complexity
with cancelling multipart uploads: the output stream needs to know that the
request failed, and that the multipart should be aborted.
If the complexity of the asynchronous calls can be coped with, *and client code
happy to accept errors in the any IO call to the output stream*, then the
initial overhead at file creation could be skipped.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]