[ 
https://issues.apache.org/jira/browse/HADOOP-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15985252#comment-15985252
 ] 

Sean Mackrory commented on HADOOP-13760:
----------------------------------------

I did a deep-dive on what's happening when renaming a directory full of various 
nested directories, files, empty directories, etc. Key things I learned:

* listFilesAndDirectories should really be named listFilesAndEmptyDirectories: 
the iterator won't return separate items for all the non-empty directories. 
[~fabbri] suggested off-line that we at least add a test for that to prevent it 
from being "fixed" in the future and we should rename it too. I don't see a 
need to implement the list-all-filesystem-vertices function right now. Now, 
this isn't a problem for its current uses: it was added so that S3GuardTool 
didn't miss empty directories when importing, and the rest of the import 
process takes care of the non-empty directories. And it just so happens that 
here it's behaving pretty much the same as the request it's replacing (although 
it filters out tombstones, empty directories don't end with a '/', etc), and 
that appears to be perfectly correct.

* the only increase in any metrics I could find is that listFilesAndDirectories 
will perform a couple more list and object metadata requests than what we were 
doing before IFF S3Guard is disabled. And we could avoid that if we go the 
route of having separate code in innerRename to filter out tombstones, but my 
previous concerns still apply. I've got some workloads running now to see how 
much the extra requests impact real performance on them. Will post details when 
I have them.

To add to the functional testing, I ran a bunch of Hive-on-MR and Hive-on-Spark 
workloads and everything still worked correctly.

> S3Guard: add delete tracking
> ----------------------------
>
>                 Key: HADOOP-13760
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13760
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Aaron Fabbri
>            Assignee: Sean Mackrory
>         Attachments: HADOOP-13760-HADOOP-13345.001.patch, 
> HADOOP-13760-HADOOP-13345.002.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> delete tracking.
> Current behavior on delete is to remove the metadata from the MetadataStore.  
> To make deletes consistent, we need to add a {{isDeleted}} flag to 
> {{PathMetadata}} and check it when returning results from functions like 
> {{getFileStatus()}} and {{listStatus()}}.  In HADOOP-13651, I added TODO 
> comments in most of the places these new conditions are needed.  The work 
> does not look too bad.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to