[ 
https://issues.apache.org/jira/browse/HADOOP-15542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515391#comment-16515391
 ] 

Steve Loughran commented on HADOOP-15542:
-----------------------------------------

as far as directory logic is concerned, the path 
{{s3://mybucket/d1/d2/d3/d4/d5/d6/d7}} is equivalent to the path 
{{s3://mybucket/d1/d2/d3/d4/d5/d6/d7/}}, therefore it is considered a direct 
parent.

View it like this. If you were in the local fs in a directory 
{{d1/d2/d3/d4/d5/d6}}, you couldn't have a file d7 and a directory d7/, as 
{{ls}}, {{mv}} and {{rm}} wouldn't know what to do. The Hadoop FS API has the 
same model on HDFS, localfs, maprfs, etc, and on other object stores (adl://, 
for example). We have to keep that metaphor consistent, even when, if you look 
closely at S3, you can create sets of objects which break the metaphor

bq. hive does the reads and does not seem to complain with the file/dir same 
name).

Not something we've ever tested for. If you have a directory structure set up 
and, say, {{s3://mybucket/d1/d2/d3/d4/d5/d6/d7/d8/d9/}} is used as the base of 
a query, nothing will notice that file up the tree. Pass in a query with the 
base of {{s3://mybucket/d1/d2/d3/d4/d5/d6/d7}} and it will find the file, not 
any of the children, because we do a HEAD before a LIST; the file gets found 
first. 

Anyway, WONTFIX. Sorry. If you look at, what HADOOP-9565, we've discussed in 
the past what a blobstore-specific API would look like, but never come up with 
a good model here.

> S3AFileSystem - FileAlreadyExistsException when prefix is a file and part of 
> a directory tree
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-15542
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15542
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.7.5, 3.1.0
>            Reporter: t oo
>            Priority: Blocker
>
> We are running Apache Spark jobs with aws-java-sdk-1.7.4.jar  
> hadoop-aws-2.7.5.jar to write parquet files to an S3 bucket. We have the key 
> 's3://mybucket/d1/d2/d3/d4/d5/d6/d7' in s3 (d7 being a text file). We also 
> have keys 
> 's3://mybucket/d1/d2/d3/d4/d5/d6/d7/d8/d9/part_dt=20180615/a.parquet' 
> (a.parquet being a file)
> When we run a spark job to write b.parquet file under 
> 's3://mybucket/d1/d2/d3/d4/d5/d6/d7/d8/d9/part_dt=20180616/' (ie would like 
> to have 's3://mybucket/d1/d2/d3/d4/d5/d6/d7/d8/d9/part_dt=20180616/b.parquet' 
> get created in s3) we get the below error
>  
>  
> org.apache.hadoop.fs.FileAlreadyExistsException: Can't make directory for 
> path 's3a://mybucket/d1/d2/d3/d4/d5/d6/d7' since it is a file.
> at org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:861)
> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1881)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to