[
https://issues.apache.org/jira/browse/HADOOP-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Mackrory updated HADOOP-14041:
-----------------------------------
Attachment: HADOOP-14041-HADOOP-13345.001.patch
Attaching a patch that adds prune(timestamp) to the MetadataStore interface and
existing implementations, a CLI tool, and tests for all of that. prune() takes
a UTC timestamp as returned by System.currentTimeMillis() and should trim
everything with a modification time older than that. The CLI tool determines
the timestamp by taking the current time and subtracting various lengths of
time. One tricky thing is you can specify minutes with -M, and all the time
ranges are in caps so that doesn't clash with -m for specifying the metastore
URL.
One thing that probably needs more work is what to do about directories. The
local implementation will delete its record of a directory if all the files it
tracks in that directory get pruned. I should at least do the equivalent for
the DynamoDB implementation, but since there's been some special consideration
for handling empty directories that may warrant some more thought. I know
[~fabbri]'s been thinking about the nuances of empty directories - any thoughts
on that?
All tests pass except as currently documented in other JIRAs. I did for a time
have a lot of tests fail at the assertion of type S3AFileStatus in
PathMetadataDynamoDBTranslation.pathMetadataToItem. Indeed, we do have a lot of
instances of FileStatus (S3AFileStatus' parent class) flying around S3Guard, so
I'm surprised I don't get it consistently, but today all the tests are passing.
I can't see how anything I've changed while working on this patch would impact
it. So just throwing this out there in case others have seen it or have any
insight.
> CLI command to prune old metadata
> ---------------------------------
>
> Key: HADOOP-14041
> URL: https://issues.apache.org/jira/browse/HADOOP-14041
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Sean Mackrory
> Assignee: Sean Mackrory
> Attachments: HADOOP-14041-HADOOP-13345.001.patch
>
>
> Add a CLI command that allows users to specify an age at which to prune
> metadata that hasn't been modified for an extended period of time. Since the
> primary use-case targeted at the moment is list consistency, it would make
> sense (especially when authoritative=false) to prune metadata that is
> expected to have become consistent a long time ago.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]