[jira] [Comment Edited] (HADOOP-13230) S3A to optionally retain directory markers

Ivan Sadikov (Jira) Mon, 24 Aug 2020 09:25:43 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-13230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183420#comment-17183420
 ]


Ivan Sadikov edited comment on HADOOP-13230 at 8/24/20, 4:24 PM:
-----------------------------------------------------------------

[[email protected]] [~steveatbat] Could you explain why we had to delete 
directory markers whenever a file is added onto the path, e.g. removing {{a/}}, 
{{b/}}, and {{c/}} when adding a file {{a/b/c/file}}? Was it done to optimise 
performance of certain operations or any other reason? I tried searching 
documentation and code but I could not find an explanation for calling 
`deleteUnnecessaryFakeDirectories` method in `finishedWrite`. Also, the method 
does not seem to check if a directory is empty or not, but I suppose it was 
done by design. I would appreciate if you could elaborate. Thanks!


was (Author: sadikovi):
[[email protected]] [~steveatbat] Could you explain why we need to delete 
directory markers whenever a file is added onto the path, e.g. removing {{a/}}, 
{{b/}}, and {{c/}} when adding a file {{a/b/c/file}}? Was it done to optimise 
performance of certain operations or any other reason? I tried searching 
documentation and code but I could not find an explanation for calling 
`deleteUnnecessaryFakeDirectories` method in `finishedWrite`. Also, the method 
does not seem to check if a directory is empty or not, but I suppose it was 
done by design. I would appreciate if you could elaborate. Thanks!

> S3A to optionally retain directory markers
> ------------------------------------------
>
>                 Key: HADOOP-13230
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13230
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.9.0
>            Reporter: Aaron Fabbri
>            Assignee: Steve Loughran
>            Priority: Major
>             Fix For: 3.3.1
>
>
> Users of s3a may not realize that, in some cases, it does not interoperate 
> well with other s3 tools, such as the AWS CLI.  (See HIVE-13778, IMPALA-3558).
> Specifically, if a user:
> - Creates an empty directory with hadoop fs -mkdir s3a://bucket/path
> - Copies data into that directory via another tool, i.e. aws cli.
> - Tries to access the data in that directory with any Hadoop software.
> Then the last step fails because the fake empty directory blob that s3a wrote 
> in the first step, causes s3a (listStatus() etc.) to continue to treat that 
> directory as empty, even though the second step was supposed to populate the 
> directory with data.
> I wanted to document this fact for users. We may mark this as not-fix, "by 
> design".. May also be interesting to brainstorm solutions and/or a config 
> option to change the behavior if folks care.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HADOOP-13230) S3A to optionally retain directory markers

Reply via email to