[
https://issues.apache.org/jira/browse/HADOOP-16279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845889#comment-16845889
]
Gabor Bota commented on HADOOP-16279:
-------------------------------------
I'm having a hard time implementing prune for the dynamo ms if we use
last_updated instead of mod_time.
The issue is that we won't update the parent directories' last_updated field,
so the implementation will remove all parent directory if the query includes
directories and not just files. This looks like the following:
we have
{code:java}
enum PruneMode {
PRUNE_ALL,
PRUNE_TOMBSTONES
}
{code}
so we can remove all expired entries, or just tombstones.
We do a query to get all expired files, but not directories like the following:
{code:java}
private ItemCollection<ScanOutcome> expiredFiles(long lastUpdated,
String keyPrefix, PruneMode pruneMode) throws IOException {
String filterExpression;
switch (pruneMode) {
case PRUNE_ALL:
filterExpression =
"last_updated < :last_updated and begins_with (parent, :parent)" +
"and is_dir <> :not_is_dir";
break;
case PRUNE_TOMBSTONES:
filterExpression =
"last_updated < :last_updated and begins_with (parent, :parent)" +
"and is_dir <> :not_is_dir and is_deleted = true";
break;
default:
throw new IllegalArgumentException("Prune mode: " + pruneMode + " is "
+ "not supported.");
}
String projectionExpression = "parent,child";
ValueMap map = new ValueMap()
.withLong(":last_updated", lastUpdated)
.withString(":parent", keyPrefix)
.withString(":not_is_dir", "true");
return readOp.retry(
"scan",
keyPrefix,
true,
() -> table.scan(filterExpression, projectionExpression, null, map));
}
{code}
So this can work, but we won't be able to remove directories from dynamodb with
prune from now on.
We need a solution to remove tombstones - so I'd propose to add a new function
for that and leave the current prune as it is - based on mod_time.
The new function would be {{pruneTombstones(long lastUpdated, String
keyPrefix)}} and it would only remove tombstone entries
----
I have some issues with implementing metadata expiry, as many tests start to
fail after I use expiry for entries (even with setting a long time for expiry
like 60 minutes).
> S3Guard: Implement time-based (TTL) expiry for entries (and tombstones)
> -----------------------------------------------------------------------
>
> Key: HADOOP-16279
> URL: https://issues.apache.org/jira/browse/HADOOP-16279
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Gabor Bota
> Assignee: Gabor Bota
> Priority: Major
> Attachments: Screenshot 2019-05-17 at 13.21.26.png
>
>
> In HADOOP-15621 we implemented TTL for Authoritative Directory Listings and
> added {{ExpirableMetadata}}. {{DDBPathMetadata}} extends {{PathMetadata}}
> extends {{ExpirableMetadata}}, so all metadata entries in ddb can expire, but
> the implementation is not done yet.
> To complete this feature the following should be done:
> * Add new tests for metadata entry and tombstone expiry to {{ITestS3GuardTtl}}
> * Implement metadata entry and tombstone expiry
> I would like to start a debate on whether we need to use separate expiry
> times for entries and tombstones. My +1 on not using separate settings - so
> only one config name and value.
> ----
> Notes:
> * In HADOOP-13649 the metadata TTL is implemented in LocalMetadataStore,
> using an existing feature in guava's cache implementation. Expiry is set with
> {{fs.s3a.s3guard.local.ttl}}.
> * LocalMetadataStore's TTL and this TTL is different. That TTL is using the
> guava cache's internal solution for the TTL of these entries. This is an
> S3AFileSystem level solution in S3Guard, a layer above all metadata store.
> * This is not the same, and not using the [DDB's TTL
> feature|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html].
> We need a different behavior than what ddb promises: [cleaning once a day
> with a background
> job|https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html]
> is not usable for this feature - although it can be used as a general
> cleanup solution separately and independently from S3Guard.
> * Use the same ttl for entries and authoritative directory listing
> * All entries can be expired. Then the returned metadata from the MS will be
> null.
> * Add two new methods pruneExpiredTtl() and pruneExpiredTtl(String keyPrefix)
> to MetadataStore interface. These methods will delete all expired metadata
> from the ms.
> * Use last_updated field in ms for both file metadata and authoritative
> directory expiry.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]