[ 
https://issues.apache.org/jira/browse/HADOOP-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872846#comment-15872846
 ] 

Aaron Fabbri commented on HADOOP-14041:
---------------------------------------

Thanks for the follow-up patch [~mackrorysd].  Looks good.. Of the comments 
below, I think the important ones are the prune() method prototype, and errors 
going to stderr.

{noformat}
+  public void testPruneDirs() throws Exception {
+    // This test does not necessarily define required behavior: directories
+    // that become empty after a prune operation could be cleaned up, but
+    // currently they don't because if a file was created in that directory
+    // mid-prune, it would violate the invariant that all ancestors of a file
{noformat}

Tiny nit: this invariant is an implementation detail of the dynamo MS.  Not a 
MetadataStore invariant per se.  Could mention the word dynamo here.

{noformat}
+    // exist in the metastore. If an implementation could satisfy this, it
+    // would be okay for this test not to pass.
+    Assume.assumeFalse(ms instanceof NullMetadataStore);
+    createNewDirs("/pruneDirs/dir");
{noformat}

Did you mean to change this Assume to call {{supportsPruning()}}?
Technically, seems like you should use that, and maybe {{allowMissing()}}?  
Basically, when allowMissing() returns true, the metadata store may not return 
results you just put into it (like a cache where something got evicted before 
you asked for it again).

{noformat}
--- 
a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java
+++ 
b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java
@@ -165,4 +165,15 @@ void move(Collection<Path> pathsToDelete, 
Collection<PathMetadata>
    * @throws IOException if there is an error
    */
   void destroy() throws IOException;
+
+  /**
+   * Clear any metadata older than a specified time from the repository. Note
+   * that modification times should be in UTC, as returned by System
+   * .currentTimeMillis at the time of modification.
+   *
+   * @param modTime Oldest modification time to allow
+   * @throws IOException if there is an error
+   * @throws InterruptedException if the process is interrupted
+   */
+  void prune(long modTime) throws InterruptedException, IOException;
 }
{noformat}
Couple of things:
1. We should mention here that implementations:  *must* clear any file metadata 
older than modTime, *may* clear any directory metadata older than modTime, and 
throw an UnsupportedOperationException(*) otherwise?
2. Instead of declaring a checked exception (InterruptedException), IMO, that 
should always be wrapped in an IOException.. So this should only be throws 
IOException.

(*) [[email protected]] is this the idiomatic thing to do here in Hadoop?

{noformat}
--- 
a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/NullMetadataStore.java
+++ 
b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/NullMetadataStore.java
@@ -87,6 +87,10 @@ public void destroy() throws IOException {
   }
  
   @Override
+  public void prune(long modTime) throws IOException {
+  }
+
{noformat}
Love the algorithm here.   Classic no-op, my fave.

{noformat}
+      if (confDelta <= 0 && cliDelta <= 0) {
+        System.out.println(
+            "You must specify a positive age for metadata to prune.");
+      }
+
{noformat}
I think this should go to stderr (search for "stderr" 
[here|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]).

{noformat}
--- 
a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestNullMetadataStore.java
+++ 
b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestNullMetadataStore.java
@@ -51,6 +51,12 @@ public boolean allowMissing() {
     return true;
   }
  
+  /** This MetadataStore won't store anything, so there's nothing to prune. */
+  @Override
+  public boolean supportsPruning() {
+    return false;
+  }
{noformat}

This part of the change could be left out, I think?  NullMetadataStore always 
prunes!  Where prune is defined as removing anything older than X.. always true 
for empty set.  :-)

> CLI command to prune old metadata
> ---------------------------------
>
>                 Key: HADOOP-14041
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14041
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>         Attachments: HADOOP-14041-HADOOP-13345.001.patch, 
> HADOOP-14041-HADOOP-13345.002.patch, HADOOP-14041-HADOOP-13345.003.patch, 
> HADOOP-14041-HADOOP-13345.004.patch, HADOOP-14041-HADOOP-13345.005.patch, 
> HADOOP-14041-HADOOP-13345.006.patch
>
>
> Add a CLI command that allows users to specify an age at which to prune 
> metadata that hasn't been modified for an extended period of time. Since the 
> primary use-case targeted at the moment is list consistency, it would make 
> sense (especially when authoritative=false) to prune metadata that is 
> expected to have become consistent a long time ago.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to