[ 
https://issues.apache.org/jira/browse/HADOOP-14124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534630#comment-16534630
 ] 

Steve Loughran commented on HADOOP-14124:
-----------------------------------------

reviewing this. Seems like there are a couple of issues related to the fact 
that tools like cyberduck  retain directory markers when creating child 
entries, and list under them.

* As s3a bails out fast when it finds one, recursive listings from above a 
marker dir will not find any children. This is because S3A assumes marker ==> 
no children and exits fast rather than scan all the way down. I'm "minded" to 
consider this a WONTFIX, unless someone really wants the option to recurse 
through them in listings, despite the performance hit.
* when you call listStatus on a dir with both marker and children, the listing 
returns both. This is not good & we should think about what to do here? Filter 
it out? 

We could also add a cleanup command to the cli which does a full list and 
deletes any dir markers for which there are children. Not sure how to do that 
efficiently on a very, very large directory tree.

> S3AFileSystem silently deletes "fake" directories when writing a file.
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-14124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14124
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs, fs/s3
>    Affects Versions: 2.6.0
>            Reporter: Joel Baranick
>            Priority: Major
>              Labels: filesystem, s3
>
> I realize that you guys probably have a good reason for {{S3AFileSystem}} to 
> cleanup "fake" folders when a file is written to S3.  That said, that fact 
> that it silently does this feels like a separation of concerns issue.  It 
> also leads to weird behavior issues where calls to 
> {{AmazonS3Client.getObjectMetadata}} for folders work before calling 
> {{S3AFileSystem.create}} but not after.  Also, there seems to be no mention 
> in the javadoc that the {{deleteUnnecessaryFakeDirectories}} method is 
> automatically invoked. Lastly, it seems like the goal of {{FileSystem}} 
> should be to ensure that code built on top of it is portable to different 
> implementations.  This behavior is an example of a case where this can break 
> down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to