[jira] [Commented] (HADOOP-15542) S3AFileSystem - FileAlreadyExistsException when prefix is a file and part of a directory tree

Steve Loughran (JIRA) Fri, 15 Jun 2018 02:54:19 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-15542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513622#comment-16513622
 ]


Steve Loughran commented on HADOOP-15542:
-----------------------------------------

going to have to close as a WONTFIX, afraid. You are not allowed to create 
files or directories under files in any filesystem, and the semantics of Hadoop 
FS is that, for a directory {schema:///path == schema///path + "/"). 

That is, {{getFileStatus("s3a://mybucket/d1/d2/d3/d4/d5/d6/d7")}} and 
{{getFileStatus("s3a://mybucket/d1/d2/d3/d4/d5/d6/d7/")}} must return the same 
value. 

Without this rule, listings would break: the 
listFiles("s3a://mybucket/d1/d2/d3/d4/d5/d6/d7") would just return that file 
entry, so the children wouldn't be found, the next spark query wouldn't find 
the generated parquet code, etc.

Th is the same as you get from a posix FS
{code}
$ touch /tmp/something.txt
$ mkdir /tmp/something.txt/child
mkdir: /tmp/something.txt: Not a directory
{code}

S3A does let you create a file under a file if it is not the immediate parent; 
it's only mkdirs which does the whole treewalk. See HADOOP-13221 for the 
discussion there, which comes down to "we know its wrong but it'd be very 
expensive and nobody seems to have noticed"

Sorry

BTW, if you are using S3A as the destination for your work, what's your 
strategy for ensuring safe commits? S3Guard, the S3A committers or just hoping 
that it doesn't surface?


> S3AFileSystem - FileAlreadyExistsException when prefix is a file and part of 
> a directory tree
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-15542
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15542
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 2.7.5, 3.1.0
>            Reporter: t oo
>            Priority: Blocker
>
> We are running Apache Spark jobs with aws-java-sdk-1.7.4.jar  
> hadoop-aws-2.7.5.jar to write parquet files to an S3 bucket. We have the key 
> 's3://mybucket/d1/d2/d3/d4/d5/d6/d7' in s3 (d7 being a text file). We also 
> have keys 
> 's3://mybucket/d1/d2/d3/d4/d5/d6/d7/d8/d9/part_dt=20180615/a.parquet' 
> (a.parquet being a file)
> When we run a spark job to write b.parquet file under 
> 's3://mybucket/d1/d2/d3/d4/d5/d6/d7/d8/d9/part_dt=20180616/' (ie would like 
> to have 's3://mybucket/d1/d2/d3/d4/d5/d6/d7/d8/d9/part_dt=20180616/b.parquet' 
> get created in s3) we get the below error
>  
>  
> org.apache.hadoop.fs.FileAlreadyExistsException: Can't make directory for 
> path 's3a://mybucket/d1/d2/d3/d4/d5/d6/d7' since it is a file.
> at org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:861)
> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1881)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-15542) S3AFileSystem - FileAlreadyExistsException when prefix is a file and part of a directory tree

Reply via email to