[jira] [Updated] (HADOOP-13760) S3Guard: add delete tracking

Sean Mackrory (JIRA) Fri, 21 Apr 2017 13:31:23 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Mackrory updated HADOOP-13760:
-----------------------------------
    Attachment: HADOOP-13760-HADOOP-13345.002.patch

[~fabbri] - Just the unit tests with Null, Local, and Dynamo implementations. 
I'm also getting an ecnryption test and the one after it failing - haven't 
entirely looked into it yet but they succeed in isolation so I'm assuming it's 
HADOOP-14305. As you pointed out offline, -Dlocal doesn't do anything, but 
because Local's the default it still ran the tests as I intended. And it 
definitely exercised all 3 implementations because I saw failures definitely 
related to each one that I had to fix. I'm getting ready to run some actually 
workloads on an actual cluster, too.

[~ste...@apache.org] - schema versioning aside, this would cause clusters 
running the old code to continue including deleted items in lists. So it 
effectly prolongs the inconsistency I'm trying to eliminate until the tombstone 
gets pruned or otherwise removed.

Attaching another incremental patch. I've implemented the todo to filter out 
deleted children server-side when we're deciding if a directory is empty. I'm 
not sure I like this - the docs indicate there are limits that apply on the 
pre-filtering data size, that very large directories may hit. I'm not clear on 
whether regular queries would hit the same limits, but with large directories 
this saves us some network traffic (but not read-bandwidth-against-quotas 
usage). I also need to dig into the use of .withMaxResults. In my .001. patch I 
was applying that limit before filtering out deletes, so it's only luck / 
coincidence that tests didn't fail thinking non-empty directories were empty. 
So I need to add a test to catch that. Also not sure if that limit applies 
before or after filtering. If it applies before, I shouldn't use it.

Also added a test that does a circular series of renames and a few fixes that 
it required. Most notably if a directory is created and then renamed fast 
enough that S3 doesn't return it in lists yet, we used to throw a 
FileNotFoundException trying to decide if it was empty. We now assume it IS 
empty.

> S3Guard: add delete tracking
> ----------------------------
>
>                 Key: HADOOP-13760
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13760
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Aaron Fabbri
>            Assignee: Sean Mackrory
>         Attachments: HADOOP-13760-HADOOP-13345.001.patch, 
> HADOOP-13760-HADOOP-13345.002.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> delete tracking.
> Current behavior on delete is to remove the metadata from the MetadataStore.  
> To make deletes consistent, we need to add a {{isDeleted}} flag to 
> {{PathMetadata}} and check it when returning results from functions like 
> {{getFileStatus()}} and {{listStatus()}}.  In HADOOP-13651, I added TODO 
> comments in most of the places these new conditions are needed.  The work 
> does not look too bad.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13760) S3Guard: add delete tracking

Reply via email to