[ 
https://issues.apache.org/jira/browse/HADOOP-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123792#comment-16123792
 ] 

Aaron Fabbri commented on HADOOP-14749:
---------------------------------------

{quote}
If we added a field for each entry as to when the record itself was created, 
then we could have AWS TTL do the pruning automatically.
{quote}
I think we will want a "entry last written" mod time field in DDB, but I don't 
think we can use S3's TTL feature without breaking the "all ancestors of any 
path P in DDB must be present" invariant.  I chatted with my friend that works 
on the DynamoDB team and he did not believe that their TTL deletion feature was 
strongly ordered enough to guarantee it, even if we could ensure we always 
wrote ancestors before children.  Maybe there is another algorithm I'm not 
thinking of though.

I do think we want a v2 prune implementation for dynamo which works better 
(i.e. actually expires directories properly).  I think that the authoritative 
mode support for dynamodb will be a big motivator for this, as if you are 
relying on DDB as source of truth for listings, then reliable expiry of stale 
data becomes more important.  I've also been thinking about the online 
algorithm variant of prune (doing it on demand in client, probabilistically / 
randomized perhaps, or on access).

> review s3guard docs & code prior to merge
> -----------------------------------------
>
>                 Key: HADOOP-14749
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14749
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: documentation, fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-14749-HADOOP-13345-001.patch, 
> HADOOP-14749-HADOOP-13345-002.patch, HADOOP-14749-HADOOP-13345-003.patch, 
> HADOOP-14749-HADOOP-13345-004.patch, HADOOP-14749-HADOOP-13345-005.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Pre-merge cleanup while it's still easy to do
> * Read through all the docs, tune
> * Diff the trunk/branch files to see if we can reduce the delta (and hence 
> the changes)
> * Review the new tests



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to