[ 
https://issues.apache.org/jira/browse/HADOOP-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010721#comment-16010721
 ] 

Sean Mackrory commented on HADOOP-13760:
----------------------------------------

Attached an updated patch incorporating *most* of Fabbri's feedback and that is 
passing all tests. The encryption test issue seems to have disappeared (fairly 
certain no change I made fixed that) and the remaining failures were due to a 
bug when creating parent directories in the metadata store. The logic that 
traversed back up the tree and stopped when it encounted a directory that 
already existed was not tombstone-aware. So it would stop that point, and other 
code that traversed the tree from the root would stop on the other side of the 
same node. So some tests would leave the metadata store in a state that 
subsequent tests couldn't work properly with because there was a subset of the 
tree that was hidden from them and that they couldn't break through into. That 
is fixed.

>> Curious if we can combine three deleted item maps to a single map holding a 
>> struct instead...

Turns out we can. I was trying to follow what we currently did with puts as 
much as it made sense, and just add additional data structures for the 
differences. In the end, once I converted everything to use a single HashMap of 
structs, the rest of the code got a bit cleaner, so I think this is a good 
change.

>> Why create a temporary HashSet just to iterate over keys?

Avoiding ConcurrentModificationException

>> (1) can this add duplicates? (2) How do you know this delayed item (e.g. 
>> /a/b/c) belongs in this listing request (e.g. list /x/y/x)?

Yeah both of these problems could exist, and I'm guessing simply don't show up 
because of the limited, contrived scenario used in the tests that use this. 
Fixed (2) in the latest patch. Forgot about (1) but will address next.


I have no idea why I added the authoritative stuff now that I review it. Agree 
with your comments, and reverting those changes.


I'm going to take a look at refactoring my s3GetFileStatus changes. Your point 
about it being S3-specific is well taken, but the decision that involves 
tombstones needs access to more of the internal state of s3GetFileStatus than 
is returned, so it'll need some rework.

Thanks for the review!

> S3Guard: add delete tracking
> ----------------------------
>
>                 Key: HADOOP-13760
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13760
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Aaron Fabbri
>            Assignee: Sean Mackrory
>         Attachments: HADOOP-13760-HADOOP-13345.001.patch, 
> HADOOP-13760-HADOOP-13345.002.patch, HADOOP-13760-HADOOP-13345.003.patch, 
> HADOOP-13760-HADOOP-13345.004.patch, HADOOP-13760-HADOOP-13345.005.patch, 
> HADOOP-13760-HADOOP-13345.006.patch, HADOOP-13760-HADOOP-13345.007.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> delete tracking.
> Current behavior on delete is to remove the metadata from the MetadataStore.  
> To make deletes consistent, we need to add a {{isDeleted}} flag to 
> {{PathMetadata}} and check it when returning results from functions like 
> {{getFileStatus()}} and {{listStatus()}}.  In HADOOP-13651, I added TODO 
> comments in most of the places these new conditions are needed.  The work 
> does not look too bad.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to