[ 
https://issues.apache.org/jira/browse/HADOOP-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HADOOP-13650:
-----------------------------------
    Attachment: HADOOP-13650-HADOOP-13345.003.patch

Thanks a lot for the feedbacks and suggestions, [[email protected]], [~aw] and 
[~cnauroth].

Upload a new patch to address the comments, re-write shell script following the 
example of {{distcp}}, and added tests for {{init/destroy}} metadata store.

bq.  Ideally return a different exit code for an exception
Done

bq. we have the option of JCommander here for arg parsing.

Hi, [[email protected]], I did not see {{JCommander}} as used in hadoop. So I 
followed the code used in NameNode disk balancer to use {{CommandFormat}}.  
Would that be ok?

bq. might be good to have the option of printing the diff out in a way that's 
easy to parse downstream. 

Currently the {{diff}} out is tab separated, similar to {{oiv}} tool delimited 
outputs. I can add {{XML/JSON}} output as a follow-on JIRA.

bq. Maybe an operation to verify that the metastore is in sync with s3,

Would a {{-q/--quite}} option to {{diff}}, with a non-zero return value be 
sufficient? Should it immediately return when the first difference be found?

bq. For the comparison, a listFiles(recursive=true) is much faster to list s3 
buckets...

This is a good suggestion. It might be difficult to do so in a near future. 
First, currently, both {{LocalMetadataStore}} and {{DynamoDBMetadataStore}} use 
hash distribution for directories, which can not guarantee the order of 
returned results. Second, IIUC using {{listFiles()}} recursively and using 
{{O(1)}} space means that we have two iterators on S3 and MS respectively. 
Given that both sides are possible to miss a sub-namespace, when to move one 
iterator instead of another, when the files that the iterators pointed to are 
different, is more difficult to implement correctly.  Should we do the 
optimization after merging to trunk?


Hi,  [~liuml07]  As mentioned in the [parent 
thread|https://issues.apache.org/jira/browse/HADOOP-13345?focusedCommentId=15801188&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15801188],
  I think that from the CLI arguments aspect, {{init|destroy}} should be able 
to just provide either metadadta store URI or s3a URI to use the command. I 
proposal the CLI parameters as the following: 

{code}
hadoop s3a init [-r UNIT] [-w UNIT]  <-g REGION -m dynamodb://table | 
s3a://bucket>
hadoop s3a destroy <-g REGION -m dynamodb://table | s3a://bucket>
{code}
 What do you think? If doing so, we need non-trivial changes in S3A and DDB MS, 
and we should file another JIRA for the change.

Thanks. 

> S3Guard: Provide command line tools to manipulate metadata store.
> -----------------------------------------------------------------
>
>                 Key: HADOOP-13650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13650
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>         Attachments: HADOOP-13650-HADOOP-13345.000.patch, 
> HADOOP-13650-HADOOP-13345.001.patch, HADOOP-13650-HADOOP-13345.002.patch, 
> HADOOP-13650-HADOOP-13345.003.patch
>
>
> Similar systems like EMRFS has the CLI tools to manipulate the metadata 
> store, i.e., create or delete metadata store, or {{import}}, {{sync}} the 
> file metadata between metadata store and S3. 
> http://docs.aws.amazon.com//ElasticMapReduce/latest/ReleaseGuide/emrfs-cli-reference.html
> S3Guard should offer similar functionality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to