[ https://issues.apache.org/jira/browse/HADOOP-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037135#comment-17037135 ]
Steve Loughran commented on HADOOP-13278: ----------------------------------------- Revisiting * If all we want to do is stop dir-under-file, we can just do a HEAD <prefix> all the way up * if we want to track what to delete as we go up, then we can do a HEAD + /, but that runs a risk of creating 404 entries * and if we optimize for a dir goes under a non-empty dir, should do a LIST first > S3AFileSystem mkdirs does not need to validate parent path components > --------------------------------------------------------------------- > > Key: HADOOP-13278 > URL: https://issues.apache.org/jira/browse/HADOOP-13278 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3, tools > Reporter: Adrian Petrescu > Priority: Minor > > According to S3 semantics, there is no conflict if a bucket contains a key > named {{a/b}} and also a directory named {{a/b/c}}. "Directories" in S3 are, > after all, nothing but prefixes. > However, the {{mkdirs}} call in {{S3AFileSystem}} does go out of its way to > traverse every parent path component for the directory it's trying to create, > making sure there's no file with that name. This is suboptimal for three main > reasons: > * Wasted API calls, since the client is getting metadata for each path > component > * This can cause *major* problems with buckets whose permissions are being > managed by IAM, where access may not be granted to the root bucket, but only > to some prefix. When you call {{mkdirs}}, even on a prefix that you have > access to, the traversal up the path will cause you to eventually hit the > root bucket, which will fail with a 403 - even though the directory creation > call would have succeeded. > * Some people might actually have a file that matches some other file's > prefix... I can't see why they would want to do that, but it's not against > S3's rules. > I've opened a pull request with a simple patch that just removes this portion > of the check. I have tested it with my team's instance of Spark + Luigi, and > can confirm it works, and resolves the aforementioned permissions issue for a > bucket on which we only had prefix access. > This is my first ticket/pull request against Hadoop, so let me know if I'm > not following some convention properly :) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org