Adrian Petrescu created HADOOP-13278:
----------------------------------------
Summary: S3AFileSystem mkdirs does not need to validate parent
path components
Key: HADOOP-13278
URL: https://issues.apache.org/jira/browse/HADOOP-13278
Project: Hadoop Common
Issue Type: Bug
Components: tools
Reporter: Adrian Petrescu
Priority: Minor
According to S3 semantics, there is no conflict if a bucket contains a key
named {{a/b}} and also a directory named {{a/b/c}}. "Directories" in S3 are,
after all, nothing but prefixes.
However, the {{mkdirs}} call in {{S3AFileSystem}} does go out of its way to
traverse every parent path component for the directory it's trying to create,
making sure there's no file with that name. This is suboptimal for three main
reasons:
* Wasted API calls, since the client is getting metadata for each path
component
* This can cause *major* problems with buckets whose permissions are being
managed by IAM, where access may not be granted to the root bucket, but only to
some prefix. When you call {{mkdirs}}, even on a prefix that you have access
to, the traversal up the path will cause you to eventually hit the root bucket,
which will fail with a 403 - even though the directory creation call would have
succeeded.
* Some people might actually have a file that matches some other file's
prefix... I can't see why they would want to do that, but it's not against S3's
rules.
I've opened a pull request with a simple patch that just removes this portion
of the check. I have tested it with my team's instance of Spark + Luigi, and
can confirm it works, and resolves the aforementioned permissions issue for a
bucket on which we only had prefix access.
This is my first ticket/pull request against Hadoop, so let me know if I'm not
following some convention properly :)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]