[
https://issues.apache.org/jira/browse/HADOOP-15542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513622#comment-16513622
]
Steve Loughran commented on HADOOP-15542:
-----------------------------------------
going to have to close as a WONTFIX, afraid. You are not allowed to create
files or directories under files in any filesystem, and the semantics of Hadoop
FS is that, for a directory {schema:///path == schema///path + "/").
That is, {{getFileStatus("s3a://mybucket/d1/d2/d3/d4/d5/d6/d7")}} and
{{getFileStatus("s3a://mybucket/d1/d2/d3/d4/d5/d6/d7/")}} must return the same
value.
Without this rule, listings would break: the
listFiles("s3a://mybucket/d1/d2/d3/d4/d5/d6/d7") would just return that file
entry, so the children wouldn't be found, the next spark query wouldn't find
the generated parquet code, etc.
Th is the same as you get from a posix FS
{code}
$ touch /tmp/something.txt
$ mkdir /tmp/something.txt/child
mkdir: /tmp/something.txt: Not a directory
{code}
S3A does let you create a file under a file if it is not the immediate parent;
it's only mkdirs which does the whole treewalk. See HADOOP-13221 for the
discussion there, which comes down to "we know its wrong but it'd be very
expensive and nobody seems to have noticed"
Sorry
BTW, if you are using S3A as the destination for your work, what's your
strategy for ensuring safe commits? S3Guard, the S3A committers or just hoping
that it doesn't surface?
> S3AFileSystem - FileAlreadyExistsException when prefix is a file and part of
> a directory tree
> ---------------------------------------------------------------------------------------------
>
> Key: HADOOP-15542
> URL: https://issues.apache.org/jira/browse/HADOOP-15542
> Project: Hadoop Common
> Issue Type: Bug
> Components: tools
> Affects Versions: 2.7.5, 3.1.0
> Reporter: t oo
> Priority: Blocker
>
> We are running Apache Spark jobs with aws-java-sdk-1.7.4.jar
> hadoop-aws-2.7.5.jar to write parquet files to an S3 bucket. We have the key
> 's3://mybucket/d1/d2/d3/d4/d5/d6/d7' in s3 (d7 being a text file). We also
> have keys
> 's3://mybucket/d1/d2/d3/d4/d5/d6/d7/d8/d9/part_dt=20180615/a.parquet'
> (a.parquet being a file)
> When we run a spark job to write b.parquet file under
> 's3://mybucket/d1/d2/d3/d4/d5/d6/d7/d8/d9/part_dt=20180616/' (ie would like
> to have 's3://mybucket/d1/d2/d3/d4/d5/d6/d7/d8/d9/part_dt=20180616/b.parquet'
> get created in s3) we get the below error
>
>
> org.apache.hadoop.fs.FileAlreadyExistsException: Can't make directory for
> path 's3a://mybucket/d1/d2/d3/d4/d5/d6/d7' since it is a file.
> at org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:861)
> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1881)
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]