[
https://issues.apache.org/jira/browse/HADOOP-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819990#comment-15819990
]
Aaron Fabbri commented on HADOOP-13650:
---------------------------------------
Also wanted to comment on the addition of fsck features. IMHO we should do it
as a separate JIRA. We have diff, import, and destroy, which together provide
basic tools for diagnosis and repair. I think we should also have a "fsck
check" command that simply returns failure code if any invariants are violated.
In particular, it should fail if a MetadataStore directory is marked as
authoritative, and its contents differ from that of S3. That violates the
"this is the full directory contents" invariant of the
DirListingMetadata#isAuthoritative flag. Of course, DynamoDB MS does not
currently persist the isAuthoritative flag on listings, so this would always
pass. When we add that feature (which will be needed for performance
improvements), this will be a good tool to see if things have diverged (e.g.
due to client crashing or concurrent modifications to overlapping subtrees).
Along those lines, a "fsck fix" command could, for any directory where that
invariant was failing, reload the contents of that directory from S3. Eventual
list consistency could cause false positives here, which the "fsck fix" would
persist, so that is a concern.
Note the "fsck check" command could also return failure when a path exists in
the MetadataStore but not in S3. Again this is subject to eventual list
consistency and that would need to be documented. It could have a configurable
time period after which we assume list consistency would not be an issue (e.g.
if a two-day old file exists in MetadataStore but not S3, it is likely to *not*
be due to eventual consistency).
> S3Guard: Provide command line tools to manipulate metadata store.
> -----------------------------------------------------------------
>
> Key: HADOOP-13650
> URL: https://issues.apache.org/jira/browse/HADOOP-13650
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: HADOOP-13345
> Reporter: Lei (Eddy) Xu
> Assignee: Lei (Eddy) Xu
> Attachments: HADOOP-13650-HADOOP-13345.000.patch,
> HADOOP-13650-HADOOP-13345.001.patch, HADOOP-13650-HADOOP-13345.002.patch,
> HADOOP-13650-HADOOP-13345.003.patch, HADOOP-13650-HADOOP-13345.004.patch,
> HADOOP-13650-HADOOP-13345.005.patch, HADOOP-13650-HADOOP-13345.006.patch,
> HADOOP-13650-HADOOP-13345.007.patch, HADOOP-13650-HADOOP-13345.008.patch
>
>
> Similar systems like EMRFS has the CLI tools to manipulate the metadata
> store, i.e., create or delete metadata store, or {{import}}, {{sync}} the
> file metadata between metadata store and S3.
> http://docs.aws.amazon.com//ElasticMapReduce/latest/ReleaseGuide/emrfs-cli-reference.html
> S3Guard should offer similar functionality.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]