[
https://issues.apache.org/jira/browse/HADOOP-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343787#comment-15343787
]
Chris Nauroth commented on HADOOP-13308:
----------------------------------------
I see a potential weakness in the S3A logic that preserves empty directories
after file deletions. If we consider an S3A instance containing a single file,
/dir1/file1, and a user deletes /dir1/file1, then the sequence of the current
logic is:
# DELETE(/dir1/file1)
# if (!exists(/dir1)) then PUT(/dir1)
We lack atomicity across those 2 steps, so if there is a process death between
those 2 steps, then /dir1 will mysteriously vanish. Rename has a similar
problem too.
I'd like to propose that those 2 steps ought to be reversed, so that the fake
directory gets created first before the delete. Then, /dir1 can't vanish. It
would cause a different side effect: if the process dies in between, then we'd
have a fake directory where we don't really need one. I don't think this
impacts correctness, because listStatus always does a listing, even if the
requested directory is a fake. There would probably be a very negligible hit
on the bill paid to Amazon for that extra object. This would require a listing
call on the parent to check existence, but that listing call already happens
today via createFakeDirectoryIfNecessary -> exists -> getFileStatus. We're
just reordering that call, so I don't think this is going to be a performance
regression.
What are your thoughts? If it makes sense, this is something I'd file in
Apache under the phase 3 (not urgent).
--Chris Nauroth
> S3A delete may fail to preserve parent directory.
> -------------------------------------------------
>
> Key: HADOOP-13308
> URL: https://issues.apache.org/jira/browse/HADOOP-13308
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Chris Nauroth
>
> When a file or directory is deleted in S3A, and the result of that deletion
> makes the parent empty, S3A must store a fake directory (a pure metadata
> object) at the parent to indicate that the directory still exists. The logic
> for restoring fake directories is not resilient to a process death. This may
> cause a directory to vanish unexpectedly after a deletion of its last child.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]