[ 
https://issues.apache.org/jira/browse/HADOOP-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-14041:
-----------------------------------
    Attachment: HADOOP-14041-HADOOP-13345.001.patch

Attaching a patch that adds prune(timestamp) to the MetadataStore interface and 
existing implementations, a CLI tool, and tests for all of that. prune() takes 
a UTC timestamp as returned by System.currentTimeMillis() and should trim 
everything with a modification time older than that. The CLI tool determines 
the timestamp by taking the current time and subtracting various lengths of 
time. One tricky thing is you can specify minutes with -M, and all the time 
ranges are in caps so that doesn't clash with -m for specifying the metastore 
URL.

One thing that probably needs more work is what to do about directories. The 
local implementation will delete its record of a directory if all the files it 
tracks in that directory get pruned. I should at least do the equivalent for 
the DynamoDB implementation, but since there's been some special consideration 
for handling empty directories that may warrant some more thought. I know 
[~fabbri]'s been thinking about the nuances of empty directories - any thoughts 
on that?

All tests pass except as currently documented in other JIRAs. I did for a time 
have a lot of tests fail at the assertion of type S3AFileStatus in 
PathMetadataDynamoDBTranslation.pathMetadataToItem. Indeed, we do have a lot of 
instances of FileStatus (S3AFileStatus' parent class) flying around S3Guard, so 
I'm surprised I don't get it consistently, but today all the tests are passing. 
I can't see how anything I've changed while working on this patch would impact 
it. So just throwing this out there in case others have seen it or have any 
insight.

> CLI command to prune old metadata
> ---------------------------------
>
>                 Key: HADOOP-14041
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14041
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>         Attachments: HADOOP-14041-HADOOP-13345.001.patch
>
>
> Add a CLI command that allows users to specify an age at which to prune 
> metadata that hasn't been modified for an extended period of time. Since the 
> primary use-case targeted at the moment is list consistency, it would make 
> sense (especially when authoritative=false) to prune metadata that is 
> expected to have become consistent a long time ago.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to