[
https://issues.apache.org/jira/browse/HADOOP-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872846#comment-15872846
]
Aaron Fabbri commented on HADOOP-14041:
---------------------------------------
Thanks for the follow-up patch [~mackrorysd]. Looks good.. Of the comments
below, I think the important ones are the prune() method prototype, and errors
going to stderr.
{noformat}
+ public void testPruneDirs() throws Exception {
+ // This test does not necessarily define required behavior: directories
+ // that become empty after a prune operation could be cleaned up, but
+ // currently they don't because if a file was created in that directory
+ // mid-prune, it would violate the invariant that all ancestors of a file
{noformat}
Tiny nit: this invariant is an implementation detail of the dynamo MS. Not a
MetadataStore invariant per se. Could mention the word dynamo here.
{noformat}
+ // exist in the metastore. If an implementation could satisfy this, it
+ // would be okay for this test not to pass.
+ Assume.assumeFalse(ms instanceof NullMetadataStore);
+ createNewDirs("/pruneDirs/dir");
{noformat}
Did you mean to change this Assume to call {{supportsPruning()}}?
Technically, seems like you should use that, and maybe {{allowMissing()}}?
Basically, when allowMissing() returns true, the metadata store may not return
results you just put into it (like a cache where something got evicted before
you asked for it again).
{noformat}
---
a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java
+++
b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java
@@ -165,4 +165,15 @@ void move(Collection<Path> pathsToDelete,
Collection<PathMetadata>
* @throws IOException if there is an error
*/
void destroy() throws IOException;
+
+ /**
+ * Clear any metadata older than a specified time from the repository. Note
+ * that modification times should be in UTC, as returned by System
+ * .currentTimeMillis at the time of modification.
+ *
+ * @param modTime Oldest modification time to allow
+ * @throws IOException if there is an error
+ * @throws InterruptedException if the process is interrupted
+ */
+ void prune(long modTime) throws InterruptedException, IOException;
}
{noformat}
Couple of things:
1. We should mention here that implementations: *must* clear any file metadata
older than modTime, *may* clear any directory metadata older than modTime, and
throw an UnsupportedOperationException(*) otherwise?
2. Instead of declaring a checked exception (InterruptedException), IMO, that
should always be wrapped in an IOException.. So this should only be throws
IOException.
(*) [[email protected]] is this the idiomatic thing to do here in Hadoop?
{noformat}
---
a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/NullMetadataStore.java
+++
b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/NullMetadataStore.java
@@ -87,6 +87,10 @@ public void destroy() throws IOException {
}
@Override
+ public void prune(long modTime) throws IOException {
+ }
+
{noformat}
Love the algorithm here. Classic no-op, my fave.
{noformat}
+ if (confDelta <= 0 && cliDelta <= 0) {
+ System.out.println(
+ "You must specify a positive age for metadata to prune.");
+ }
+
{noformat}
I think this should go to stderr (search for "stderr"
[here|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]).
{noformat}
---
a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestNullMetadataStore.java
+++
b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestNullMetadataStore.java
@@ -51,6 +51,12 @@ public boolean allowMissing() {
return true;
}
+ /** This MetadataStore won't store anything, so there's nothing to prune. */
+ @Override
+ public boolean supportsPruning() {
+ return false;
+ }
{noformat}
This part of the change could be left out, I think? NullMetadataStore always
prunes! Where prune is defined as removing anything older than X.. always true
for empty set. :-)
> CLI command to prune old metadata
> ---------------------------------
>
> Key: HADOOP-14041
> URL: https://issues.apache.org/jira/browse/HADOOP-14041
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Sean Mackrory
> Assignee: Sean Mackrory
> Attachments: HADOOP-14041-HADOOP-13345.001.patch,
> HADOOP-14041-HADOOP-13345.002.patch, HADOOP-14041-HADOOP-13345.003.patch,
> HADOOP-14041-HADOOP-13345.004.patch, HADOOP-14041-HADOOP-13345.005.patch,
> HADOOP-14041-HADOOP-13345.006.patch
>
>
> Add a CLI command that allows users to specify an age at which to prune
> metadata that hasn't been modified for an extended period of time. Since the
> primary use-case targeted at the moment is list consistency, it would make
> sense (especially when authoritative=false) to prune metadata that is
> expected to have become consistent a long time ago.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]