[jira] [Commented] (HADOOP-13278) S3AFileSystem mkdirs does not need to validate parent path components

Steve Loughran (JIRA) Wed, 29 Aug 2018 10:35:09 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596636#comment-16596636
 ]


Steve Loughran commented on HADOOP-13278:
-----------------------------------------

Moving to branch-3.3

As noted, we do want to check up the path, so the current PR isn't going to 
work. The one thing we can do is handle permissions 

# during that walk up the tree an {{AccessDeniedException}} is raised, that can 
be caught and used to indicate that  "you can't do anything up there", and the 
mkdirs simply assumes that all is good.
# will need a test in org.apache.hadoop.fs.s3a.auth.ITestAssumeRole which 
creates a role with the restricted permissions (skipped if s3guard is enabled, 
BTW), and then verifies that the mkdirs(/a/b/c) fails even as 
getFileStatus("a") fails because A is blocked. 

Not got time to work on this; postponing to 3.3+, contributions *with that 
test* welcome.

> S3AFileSystem mkdirs does not need to validate parent path components
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-13278
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13278
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3, tools
>            Reporter: Adrian Petrescu
>            Priority: Minor
>
> According to S3 semantics, there is no conflict if a bucket contains a key 
> named {{a/b}} and also a directory named {{a/b/c}}. "Directories" in S3 are, 
> after all, nothing but prefixes.
> However, the {{mkdirs}} call in {{S3AFileSystem}} does go out of its way to 
> traverse every parent path component for the directory it's trying to create, 
> making sure there's no file with that name. This is suboptimal for three main 
> reasons:
>  * Wasted API calls, since the client is getting metadata for each path 
> component 
>  * This can cause *major* problems with buckets whose permissions are being 
> managed by IAM, where access may not be granted to the root bucket, but only 
> to some prefix. When you call {{mkdirs}}, even on a prefix that you have 
> access to, the traversal up the path will cause you to eventually hit the 
> root bucket, which will fail with a 403 - even though the directory creation 
> call would have succeeded.
>  * Some people might actually have a file that matches some other file's 
> prefix... I can't see why they would want to do that, but it's not against 
> S3's rules.
> I've opened a pull request with a simple patch that just removes this portion 
> of the check. I have tested it with my team's instance of Spark + Luigi, and 
> can confirm it works, and resolves the aforementioned permissions issue for a 
> bucket on which we only had prefix access.
> This is my first ticket/pull request against Hadoop, so let me know if I'm 
> not following some convention properly :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13278) S3AFileSystem mkdirs does not need to validate parent path components

Reply via email to