[ https://issues.apache.org/jira/browse/HADOOP-14124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534630#comment-16534630 ]
Steve Loughran commented on HADOOP-14124: ----------------------------------------- reviewing this. Seems like there are a couple of issues related to the fact that tools like cyberduck retain directory markers when creating child entries, and list under them. * As s3a bails out fast when it finds one, recursive listings from above a marker dir will not find any children. This is because S3A assumes marker ==> no children and exits fast rather than scan all the way down. I'm "minded" to consider this a WONTFIX, unless someone really wants the option to recurse through them in listings, despite the performance hit. * when you call listStatus on a dir with both marker and children, the listing returns both. This is not good & we should think about what to do here? Filter it out? We could also add a cleanup command to the cli which does a full list and deletes any dir markers for which there are children. Not sure how to do that efficiently on a very, very large directory tree. > S3AFileSystem silently deletes "fake" directories when writing a file. > ---------------------------------------------------------------------- > > Key: HADOOP-14124 > URL: https://issues.apache.org/jira/browse/HADOOP-14124 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/s3 > Affects Versions: 2.6.0 > Reporter: Joel Baranick > Priority: Major > Labels: filesystem, s3 > > I realize that you guys probably have a good reason for {{S3AFileSystem}} to > cleanup "fake" folders when a file is written to S3. That said, that fact > that it silently does this feels like a separation of concerns issue. It > also leads to weird behavior issues where calls to > {{AmazonS3Client.getObjectMetadata}} for folders work before calling > {{S3AFileSystem.create}} but not after. Also, there seems to be no mention > in the javadoc that the {{deleteUnnecessaryFakeDirectories}} method is > automatically invoked. Lastly, it seems like the goal of {{FileSystem}} > should be to ensure that code built on top of it is portable to different > implementations. This behavior is an example of a case where this can break > down. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org