[ 
https://issues.apache.org/jira/browse/HADOOP-16279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16842139#comment-16842139
 ] 

Steve Loughran commented on HADOOP-16279:
-----------------------------------------

addition: looking the raw DDB table against which I run my tests (i.e. 
switching between auth, non-auth, OOB etc), I can see that there's a few 
hundred entries in there which aren't matched in the store, and whose last 
updated date == 0;  but modtime > 0. we have to conclude that's the state other 
people's stores are going to exist in: some of these are from new tests I've 
added recently, specifically those from generating inconsistencies on partial 
rename failures, among other things. 

we do need to consider the special case of lastUpdated  == 0, both on old data 
and if an older client updates the store.

attached: screenshot of the store. Worth considering whether we should offer a 
s3guard command to see this low-level detail. e.g list and/or delete.

I'm not going to attempt any cleanup of this; a goal of this and the fsck code 
should be to clean it up.


> S3Guard: Implement time-based (TTL) expiry for entries (and tombstones)
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-16279
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16279
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Gabor Bota
>            Assignee: Gabor Bota
>            Priority: Major
>         Attachments: Screenshot 2019-05-17 at 13.21.26.png
>
>
> In HADOOP-15621 we implemented TTL for Authoritative Directory Listings and 
> added {{ExpirableMetadata}}. {{DDBPathMetadata}} extends {{PathMetadata}} 
> extends {{ExpirableMetadata}}, so all metadata entries in ddb can expire, but 
> the implementation is not done yet. 
> To complete this feature the following should be done:
> * Add new tests for metadata entry and tombstone expiry to {{ITestS3GuardTtl}}
> * Implement metadata entry and tombstone expiry 
> I would like to start a debate on whether we need to use separate expiry 
> times for entries and tombstones. My +1 on not using separate settings - so 
> only one config name and value.
> ----
> Notes:
> * In HADOOP-13649 the metadata TTL is implemented in LocalMetadataStore, 
> using an existing feature in guava's cache implementation. Expiry is set with 
> {{fs.s3a.s3guard.local.ttl}}.
> * LocalMetadataStore's TTL and this TTL is different. That TTL is using the 
> guava cache's internal solution for the TTL of these entries. This is an 
> S3AFileSystem level solution in S3Guard, a layer above all metadata store.
> * This is not the same, and not using the [DDB's TTL 
> feature|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html].
>  We need a different behavior than what ddb promises: [cleaning once a day 
> with a background 
> job|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html]
>  is not usable for this feature - although it can be used as a general 
> cleanup solution separately and independently from S3Guard.
> * Use the same ttl for entries and authoritative directory listing
> * All entries can be expired. Then the returned metadata from the MS will be 
> null.
> * Add two new methods pruneExpiredTtl() and pruneExpiredTtl(String keyPrefix) 
> to MetadataStore interface. These methods will delete all expired metadata 
> from the ms.
> * Use last_updated field in ms for both file metadata and authoritative 
> directory expiry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to