[ 
https://issues.apache.org/jira/browse/HADOOP-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343787#comment-15343787
 ] 

Chris Nauroth commented on HADOOP-13308:
----------------------------------------

I see a potential weakness in the S3A logic that preserves empty directories 
after file deletions.  If we consider an S3A instance containing a single file, 
/dir1/file1, and a user deletes /dir1/file1, then the sequence of the current 
logic is:

# DELETE(/dir1/file1)
# if (!exists(/dir1)) then PUT(/dir1)

We lack atomicity across those 2 steps, so if there is a process death between 
those 2 steps, then /dir1 will mysteriously vanish.  Rename has a similar 
problem too.

I'd like to propose that those 2 steps ought to be reversed, so that the fake 
directory gets created first before the delete.  Then, /dir1 can't vanish.  It 
would cause a different side effect: if the process dies in between, then we'd 
have a fake directory where we don't really need one.  I don't think this 
impacts correctness, because listStatus always does a listing, even if the 
requested directory is a fake.  There would probably be a very negligible hit 
on the bill paid to Amazon for that extra object.  This would require a listing 
call on the parent to check existence, but that listing call already happens 
today via createFakeDirectoryIfNecessary -> exists -> getFileStatus.  We're 
just reordering that call, so I don't think this is going to be a performance 
regression.

What are your thoughts?  If it makes sense, this is something I'd file in 
Apache under the phase 3 (not urgent).

--Chris Nauroth

> S3A delete may fail to preserve parent directory.
> -------------------------------------------------
>
>                 Key: HADOOP-13308
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13308
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Chris Nauroth
>
> When a file or directory is deleted in S3A, and the result of that deletion 
> makes the parent empty, S3A must store a fake directory (a pure metadata 
> object) at the parent to indicate that the directory still exists.  The logic 
> for restoring fake directories is not resilient to a process death.  This may 
> cause a directory to vanish unexpectedly after a deletion of its last child.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to