[ 
https://issues.apache.org/jira/browse/HADOOP-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819990#comment-15819990
 ] 

Aaron Fabbri commented on HADOOP-13650:
---------------------------------------

Also wanted to comment on the addition of fsck features.  IMHO we should do it 
as a separate JIRA.  We have diff, import, and destroy, which together provide 
basic tools for diagnosis and repair.  I think we should also have a "fsck 
check" command that simply returns failure code if any invariants are violated. 
 In particular, it should fail if a MetadataStore directory is marked as 
authoritative, and its contents differ from that of S3.  That violates the 
"this is the full directory contents" invariant of the 
DirListingMetadata#isAuthoritative flag.  Of course, DynamoDB MS does not 
currently persist the isAuthoritative flag on listings, so this would always 
pass.  When we add that feature (which will be needed for performance 
improvements), this will be a good tool to see if things have diverged (e.g. 
due to client crashing or concurrent modifications to overlapping subtrees).

Along those lines, a "fsck fix" command could, for any directory where that 
invariant was failing, reload the contents of that directory from S3.  Eventual 
list consistency could cause false positives here, which the "fsck fix" would 
persist, so that is a concern.

Note the "fsck check" command could also return failure when a path exists in 
the MetadataStore but not in S3.  Again this is subject to eventual list 
consistency and that would need to be documented.  It could have a configurable 
time period after which we assume list consistency would not be an issue (e.g. 
if a two-day old file exists in MetadataStore but not S3, it is likely to *not* 
be due to eventual consistency).


> S3Guard: Provide command line tools to manipulate metadata store.
> -----------------------------------------------------------------
>
>                 Key: HADOOP-13650
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13650
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>         Attachments: HADOOP-13650-HADOOP-13345.000.patch, 
> HADOOP-13650-HADOOP-13345.001.patch, HADOOP-13650-HADOOP-13345.002.patch, 
> HADOOP-13650-HADOOP-13345.003.patch, HADOOP-13650-HADOOP-13345.004.patch, 
> HADOOP-13650-HADOOP-13345.005.patch, HADOOP-13650-HADOOP-13345.006.patch, 
> HADOOP-13650-HADOOP-13345.007.patch, HADOOP-13650-HADOOP-13345.008.patch
>
>
> Similar systems like EMRFS has the CLI tools to manipulate the metadata 
> store, i.e., create or delete metadata store, or {{import}}, {{sync}} the 
> file metadata between metadata store and S3. 
> http://docs.aws.amazon.com//ElasticMapReduce/latest/ReleaseGuide/emrfs-cli-reference.html
> S3Guard should offer similar functionality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to