[
https://issues.apache.org/jira/browse/HADOOP-14124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534630#comment-16534630
]
Steve Loughran commented on HADOOP-14124:
-----------------------------------------
reviewing this. Seems like there are a couple of issues related to the fact
that tools like cyberduck retain directory markers when creating child
entries, and list under them.
* As s3a bails out fast when it finds one, recursive listings from above a
marker dir will not find any children. This is because S3A assumes marker ==>
no children and exits fast rather than scan all the way down. I'm "minded" to
consider this a WONTFIX, unless someone really wants the option to recurse
through them in listings, despite the performance hit.
* when you call listStatus on a dir with both marker and children, the listing
returns both. This is not good & we should think about what to do here? Filter
it out?
We could also add a cleanup command to the cli which does a full list and
deletes any dir markers for which there are children. Not sure how to do that
efficiently on a very, very large directory tree.
> S3AFileSystem silently deletes "fake" directories when writing a file.
> ----------------------------------------------------------------------
>
> Key: HADOOP-14124
> URL: https://issues.apache.org/jira/browse/HADOOP-14124
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs, fs/s3
> Affects Versions: 2.6.0
> Reporter: Joel Baranick
> Priority: Major
> Labels: filesystem, s3
>
> I realize that you guys probably have a good reason for {{S3AFileSystem}} to
> cleanup "fake" folders when a file is written to S3. That said, that fact
> that it silently does this feels like a separation of concerns issue. It
> also leads to weird behavior issues where calls to
> {{AmazonS3Client.getObjectMetadata}} for folders work before calling
> {{S3AFileSystem.create}} but not after. Also, there seems to be no mention
> in the javadoc that the {{deleteUnnecessaryFakeDirectories}} method is
> automatically invoked. Lastly, it seems like the goal of {{FileSystem}}
> should be to ensure that code built on top of it is portable to different
> implementations. This behavior is an example of a case where this can break
> down.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]