[ 
https://issues.apache.org/jira/browse/HADOOP-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332628#comment-15332628
 ] 

ASF GitHub Bot commented on HADOOP-13278:
-----------------------------------------

GitHub user apetresc opened a pull request:

    https://github.com/apache/hadoop/pull/100

    HADOOP-13278. S3AFileSystem mkdirs does not need to validate parent path 
components

    According to S3 semantics, there is no conflict if a bucket contains a key 
named `a/b` and also a directory named `a/b/c`. "Directories" in S3 are, after 
all, nothing but prefixes.
    
    However, the `mkdirs` call in `S3AFileSystem` does go out of its way to 
traverse every parent path component for the directory it's trying to create, 
making sure there's no file with that name. This is suboptimal for three main 
reasons:
    
     * Wasted API calls, since the client is getting metadata for each path 
component 
     * This can cause *major* problems with buckets whose permissions are being 
managed by IAM, where access may not be granted to the root bucket, but only to 
some prefix. When you call `mkdirs`, even on a prefix that you have access to, 
the traversal up the path will cause you to eventually hit the root bucket, 
which will fail with a 403 - even though the directory creation call would have 
succeeded.
     * Some people might actually have a file that matches some other file's 
prefix... I can't see why they would want to do that, but it's not against S3's 
rules.
    
    [I've opened a ticket](https://issues.apache.org/jira/browse/HADOOP-13278) 
on the Hadoop JIRA. This  pull request is a simple patch that just removes this 
portion of the check. I have tested it with my team's instance of Spark + 
Luigi, and can confirm it works, and resolves the aforementioned permissions 
issue for a bucket on which we only had prefix access.
    
    This is my first ticket/pull request against Hadoop, so let me know if I'm 
not following some convention properly :)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rubikloud/hadoop s3a-root-path-components

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hadoop/pull/100.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #100
    
----
commit 8a28062d34e5f0c0b83a9577dc9d818bab58c269
Author: Adrian Petrescu <apetr...@gmail.com>
Date:   2016-06-15T14:15:21Z

    No need to check parent path components when creating a directory.
    
    Given S3 semantics, there's actually no problem with having a/b/c be a 
prefix even if
    a/b or a is already a file. So there's no need to check for it - it wastes 
API calls
    and can lead to problems with access control if the caller only has 
permissions
    starting at some prefix.

----


> S3AFileSystem mkdirs does not need to validate parent path components
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-13278
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13278
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools
>            Reporter: Adrian Petrescu
>            Priority: Minor
>
> According to S3 semantics, there is no conflict if a bucket contains a key 
> named {{a/b}} and also a directory named {{a/b/c}}. "Directories" in S3 are, 
> after all, nothing but prefixes.
> However, the {{mkdirs}} call in {{S3AFileSystem}} does go out of its way to 
> traverse every parent path component for the directory it's trying to create, 
> making sure there's no file with that name. This is suboptimal for three main 
> reasons:
>  * Wasted API calls, since the client is getting metadata for each path 
> component 
>  * This can cause *major* problems with buckets whose permissions are being 
> managed by IAM, where access may not be granted to the root bucket, but only 
> to some prefix. When you call {{mkdirs}}, even on a prefix that you have 
> access to, the traversal up the path will cause you to eventually hit the 
> root bucket, which will fail with a 403 - even though the directory creation 
> call would have succeeded.
>  * Some people might actually have a file that matches some other file's 
> prefix... I can't see why they would want to do that, but it's not against 
> S3's rules.
> I've opened a pull request with a simple patch that just removes this portion 
> of the check. I have tested it with my team's instance of Spark + Luigi, and 
> can confirm it works, and resolves the aforementioned permissions issue for a 
> bucket on which we only had prefix access.
> This is my first ticket/pull request against Hadoop, so let me know if I'm 
> not following some convention properly :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to