[
https://issues.apache.org/jira/browse/HADOOP-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lei (Eddy) Xu updated HADOOP-13650:
-----------------------------------
Attachment: HADOOP-13650-HADOOP-13345.003.patch
Thanks a lot for the feedbacks and suggestions, [[email protected]], [~aw] and
[~cnauroth].
Upload a new patch to address the comments, re-write shell script following the
example of {{distcp}}, and added tests for {{init/destroy}} metadata store.
bq. Ideally return a different exit code for an exception
Done
bq. we have the option of JCommander here for arg parsing.
Hi, [[email protected]], I did not see {{JCommander}} as used in hadoop. So I
followed the code used in NameNode disk balancer to use {{CommandFormat}}.
Would that be ok?
bq. might be good to have the option of printing the diff out in a way that's
easy to parse downstream.
Currently the {{diff}} out is tab separated, similar to {{oiv}} tool delimited
outputs. I can add {{XML/JSON}} output as a follow-on JIRA.
bq. Maybe an operation to verify that the metastore is in sync with s3,
Would a {{-q/--quite}} option to {{diff}}, with a non-zero return value be
sufficient? Should it immediately return when the first difference be found?
bq. For the comparison, a listFiles(recursive=true) is much faster to list s3
buckets...
This is a good suggestion. It might be difficult to do so in a near future.
First, currently, both {{LocalMetadataStore}} and {{DynamoDBMetadataStore}} use
hash distribution for directories, which can not guarantee the order of
returned results. Second, IIUC using {{listFiles()}} recursively and using
{{O(1)}} space means that we have two iterators on S3 and MS respectively.
Given that both sides are possible to miss a sub-namespace, when to move one
iterator instead of another, when the files that the iterators pointed to are
different, is more difficult to implement correctly. Should we do the
optimization after merging to trunk?
Hi, [~liuml07] As mentioned in the [parent
thread|https://issues.apache.org/jira/browse/HADOOP-13345?focusedCommentId=15801188&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15801188],
I think that from the CLI arguments aspect, {{init|destroy}} should be able
to just provide either metadadta store URI or s3a URI to use the command. I
proposal the CLI parameters as the following:
{code}
hadoop s3a init [-r UNIT] [-w UNIT] <-g REGION -m dynamodb://table |
s3a://bucket>
hadoop s3a destroy <-g REGION -m dynamodb://table | s3a://bucket>
{code}
What do you think? If doing so, we need non-trivial changes in S3A and DDB MS,
and we should file another JIRA for the change.
Thanks.
> S3Guard: Provide command line tools to manipulate metadata store.
> -----------------------------------------------------------------
>
> Key: HADOOP-13650
> URL: https://issues.apache.org/jira/browse/HADOOP-13650
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Lei (Eddy) Xu
> Assignee: Lei (Eddy) Xu
> Attachments: HADOOP-13650-HADOOP-13345.000.patch,
> HADOOP-13650-HADOOP-13345.001.patch, HADOOP-13650-HADOOP-13345.002.patch,
> HADOOP-13650-HADOOP-13345.003.patch
>
>
> Similar systems like EMRFS has the CLI tools to manipulate the metadata
> store, i.e., create or delete metadata store, or {{import}}, {{sync}} the
> file metadata between metadata store and S3.
> http://docs.aws.amazon.com//ElasticMapReduce/latest/ReleaseGuide/emrfs-cli-reference.html
> S3Guard should offer similar functionality.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]