[
https://issues.apache.org/jira/browse/HADOOP-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858874#comment-15858874
]
Aaron Fabbri commented on HADOOP-14041:
---------------------------------------
Thank you for the patch. Overall it looks pretty good. A couple of things
need addressing. The core-default and the InterruptedException are the
important ones.
{noformat}
+ @InterfaceStability.Unstable
+ public static final String S3GUARD_CLI_PRUNE_AGE =
+ "fs.s3a.s3guard.cli.prune.age";
{noformat}
Probably want a snippet in core-default.xml.
{noformat}
+ @InterfaceStability.Unstable
+ public static final String S3GUARD_DDB_BATCH_SLEEP_MSEC_KEY =
+ "fs.s3a.s3guard.ddb.batch.sleep";
+ public static final int S3GUARD_DDB_BATCH_SLEEP_MSEC_DEFAULT = 25;
{noformat}
Same here. Also wondering if we should call this "...ddb.prune.batch.sleep" as
to not cause confusion with stuff like HADOOP-13904. I think prune is going to
remain a special case since it is a background priority job. We could also
call it "...ddb.background.sleep" to future-proof it for other background tasks
(e.g. if we introduced an background scrubber or integrity checker?
{noformat}
+ deletionBatch.add(path);
+ if (deletionBatch.size() == S3GUARD_DDB_BATCH_WRITE_REQUEST_LIMIT) {
+ Thread.sleep(delay);
+ processBatchWriteRequest(pathToKey(deletionBatch), new Item[0]);
+ }
+ } catch (IOException e) {
+ LOG.error(e.getMessage());
+ }
+ if (deletionBatch.size() > 0) {
+ Thread.sleep(delay);
+ processBatchWriteRequest(pathToKey(deletionBatch), new Item[0]);
+ }
{noformat}
Minor nit: I would make sleep happen between batches (not before the first).
e.g.
{noformat}
long batchCount = 0;
...
deletionBatch.add(path);
if (deletionBatch.size() == S3GUARD_DDB_BATCH_WRITE_REQUEST_LIMIT) {
if (batchCount++ > 0) { // don't sleep before first batch
Thread.sleep(delay);
}
processBatchWriteRequest(pathToKey(deletionBatch), new Item[0]);
...
{noformat}
You could also use that for an interesting log message {{LOG.debug("Finished
processing {} batches", batchCount);}}
{noformat}
+ } catch (InterruptedException e) {
+ LOG.warn("Pruning operation was interrupted!");
+ }
{noformat}
You need to propagate this exception, or set the threads' interrupt status.
> CLI command to prune old metadata
> ---------------------------------
>
> Key: HADOOP-14041
> URL: https://issues.apache.org/jira/browse/HADOOP-14041
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Sean Mackrory
> Assignee: Sean Mackrory
> Attachments: HADOOP-14041-HADOOP-13345.001.patch,
> HADOOP-14041-HADOOP-13345.002.patch, HADOOP-14041-HADOOP-13345.003.patch
>
>
> Add a CLI command that allows users to specify an age at which to prune
> metadata that hasn't been modified for an extended period of time. Since the
> primary use-case targeted at the moment is list consistency, it would make
> sense (especially when authoritative=false) to prune metadata that is
> expected to have become consistent a long time ago.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]