[
https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabor Bota updated HADOOP-15621:
--------------------------------
Attachment: HADOOP-15621.001.patch
> s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store
> ----------------------------------------------------------------------
>
> Key: HADOOP-15621
> URL: https://issues.apache.org/jira/browse/HADOOP-15621
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.0.0-beta1
> Reporter: Aaron Fabbri
> Assignee: Gabor Bota
> Priority: Minor
> Attachments: HADOOP-15621.001.patch
>
>
> Similar to HADOOP-13649, I think we should add a TTL (time to live) feature
> to the Dynamo metadata store (MS) for S3Guard.
> Think of this as the "online algorithm" version of the CLI prune() function,
> which is the "offline algorithm".
> Why:
> 1. Self healing (soft state): since we do not implement transactions around
> modification of the two systems (s3 and metadata store), certain failures can
> lead to inconsistency between S3 and the metadata store (MS) state. Having a
> time to live (TTL) on each entry in S3Guard means that any inconsistencies
> will be time bound. Thus "wait and restart your job" becomes a valid, if
> ugly, way to get around any issues with FS client failure leaving things in a
> bad state.
> 2. We could make manual invocation of `hadoop s3guard prune ...` unnecessary,
> depending on the implementation.
> 3. Makes it possible to fix the problem that dynamo MS prune() doesn't prune
> directories due to the lack of true modification time.
> How:
> I think we need a new column in the dynamo table "entry last written time".
> This is updated each time the entry is written to dynamo.
> After that we can either
> 1. Have the client simply ignore / elide any entries that are older than the
> configured TTL.
> 2. Have the client delete entries older than the TTL.
> The issue with #2 is it will increase latency if done inline in the context
> of an FS operation. We could mitigate this some by using an async helper
> thread, or probabilistically doing it "some times" to amortize the expense of
> deleting stale entries (allowing some batching as well).
> Caveats:
> - Clock synchronization as usual is a concern. Many clusters already keep
> clocks close enough via NTP. We should at least document the requirement
> along with the configuration knob that enables the feature.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]