[
https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636408#comment-16636408
]
Hudson commented on HADOOP-15621:
---------------------------------
FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #15103 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/15103/])
HADOOP-15621 S3Guard: Implement time-based (TTL) expiry for (fabbri: rev
046b8768af8a07a9e10ce43f538d6ac16e7fa5f3)
* (edit)
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
* (edit) hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
* (edit)
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/PathMetadataDynamoDBTranslation.java
* (edit)
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/PathMetadata.java
* (edit)
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStoreTestBase.java
* (edit)
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/DynamoDBMetadataStore.java
* (edit)
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/DirListingMetadata.java
* (edit)
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/DDBPathMetadata.java
* (edit)
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java
* (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* (edit)
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestPathMetadataDynamoDBTranslation.java
* (edit)
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java
* (edit)
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
* (edit)
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestS3Guard.java
> S3Guard: Implement time-based (TTL) expiry for Authoritative Directory Listing
> ------------------------------------------------------------------------------
>
> Key: HADOOP-15621
> URL: https://issues.apache.org/jira/browse/HADOOP-15621
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.0.0-beta1
> Reporter: Aaron Fabbri
> Assignee: Gabor Bota
> Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-15621.001.patch, HADOOP-15621.002.patch
>
>
> Similar to HADOOP-13649, I think we should add a TTL (time to live) feature
> to the Dynamo metadata store (MS) for S3Guard.
> This is a similar concept to an "online algorithm" version of the CLI prune()
> function, which is the "offline algorithm".
> Why:
> 1. Self healing (soft state): since we do not implement transactions around
> modification of the two systems (s3 and metadata store), certain failures can
> lead to inconsistency between S3 and the metadata store (MS) state. Having a
> time to live (TTL) on each entry in S3Guard means that any inconsistencies
> will be time bound. Thus "wait and restart your job" becomes a valid, if
> ugly, way to get around any issues with FS client failure leaving things in a
> bad state.
> 2. We could make manual invocation of `hadoop s3guard prune ...`
> unnecessary, depending on the implementation.
> 3. Makes it possible to fix the problem that dynamo MS prune() doesn't prune
> directories due to the lack of true modification time.
> How:
> I think we need a new column in the dynamo table "entry last written time".
> This is updated each time the entry is written to dynamo.
> After that we can either
> 1. Have the client simply ignore / elide any entries that are older than the
> configured TTL.
> 2. Have the client delete entries older than the TTL.
> The issue with #2 is it will increase latency if done inline in the context
> of an FS operation. We could mitigate this some by using an async helper
> thread, or probabilistically doing it "some times" to amortize the expense of
> deleting stale entries (allowing some batching as well).
> Caveats:
> - Clock synchronization as usual is a concern. Many clusters already keep
> clocks close enough via NTP. We should at least document the requirement
> along with the configuration knob that enables the feature.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]