[ 
https://issues.apache.org/jira/browse/HADOOP-16279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835944#comment-16835944
 ] 

Aaron Fabbri commented on HADOOP-16279:
---------------------------------------

Thanks for the work on this stuff [~gabor.bota]. I commented on the PR. The 
logic looks pretty good but I think the design needs discussion here. The 
current patch sort of combines the two different ideas:

1. "Authoritative TTL": how fresh a MetadataStore entry needs to be for S3A to 
skip S3 query.
2. "Max entry lifetime" in MetadataStore.

I think these concepts should be kept separate in the public APIs/configs at 
least.

There are a couple of cases when querying MetadataStore (MS):
I. MetadataStore returns null (no information on that path)
II. MetadataStore returns something (has metadata entry for that path).
  II.a. entry is newer than authoritative TTL (S3A may short-circuit and skip 
S3 query)
  II.b. entry is older than authoritative TTL (there is data but S3A needs to 
also query  S3)

The patch combines II.b and I.

Sticking with the "general design, specific implementation" ideal, I'd keep the 
public interfaces and config params designed as above instead. That doesn't 
prevent you from doing a more simple implementation (e.g. for now, return null 
from S3Guard.getWithTtl() in case II.b. as you do in your patch. That works 
because it *does* cause S3A to query S3.)

So the patch made sense except the naming and description of the configuration 
parameter (i think it should be specifically for is "authoritative", not for 
existence of an entry in MS). And I didn't understand why we need more prune() 
functions added to the MS interface. Also I thought the LocalMetadataStore use 
of guava Cache meant the work was already done there?

My hope is that later on, we can replace this implementation of II.b. (where 
getWithTtl() returns null) with smarter logic that allows you set a policy for  
handling S3 versus MS conflicts.  (In this case, get() returns a PathMetadata, 
S3A would check if auth TTL expired, if so still queries S3 and if the data in 
S3 and MS conflict, take action depending on the configured conflict policy).

Shout if I can clarify this at all.



> S3Guard: Implement time-based (TTL) expiry for entries (and tombstones)
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-16279
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16279
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Gabor Bota
>            Assignee: Gabor Bota
>            Priority: Major
>
> In HADOOP-15621 we implemented TTL for Authoritative Directory Listings and 
> added {{ExpirableMetadata}}. {{DDBPathMetadata}} extends {{PathMetadata}} 
> extends {{ExpirableMetadata}}, so all metadata entries in ddb can expire, but 
> the implementation is not done yet. 
> To complete this feature the following should be done:
> * Add new tests for metadata entry and tombstone expiry to {{ITestS3GuardTtl}}
> * Implement metadata entry and tombstone expiry 
> I would like to start a debate on whether we need to use separate expiry 
> times for entries and tombstones. My +1 on not using separate settings - so 
> only one config name and value.
> ----
> Notes:
> * In HADOOP-13649 the metadata TTL is implemented in LocalMetadataStore, 
> using an existing feature in guava's cache implementation. Expiry is set with 
> {{fs.s3a.s3guard.local.ttl}}.
> * LocalMetadataStore's TTL and this TTL is different. That TTL is using the 
> guava cache's internal solution for the TTL of these entries. This is an 
> S3AFileSystem level solution in S3Guard, a layer above all metadata store.
> * This is not the same, and not using the [DDB's TTL 
> feature|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html].
>  We need a different behavior than what ddb promises: [cleaning once a day 
> with a background 
> job|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html]
>  is not usable for this feature - although it can be used as a general 
> cleanup solution separately and independently from S3Guard.
> * Use the same ttl for entries and authoritative directory listing
> * All entries can be expired. Then the returned metadata from the MS will be 
> null.
> * Add two new methods pruneExpiredTtl() and pruneExpiredTtl(String keyPrefix) 
> to MetadataStore interface. These methods will delete all expired metadata 
> from the ms.
> * Use last_updated field in ms for both file metadata and authoritative 
> directory expiry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to