[
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857029#comment-16857029
]
Aaron Fabbri edited comment on HADOOP-13980 at 6/5/19 8:55 PM:
---------------------------------------------------------------
Thanks for your draft of FSCK requirements [[email protected]]. This is a good
start.
One thing that comes to mind: I don't know that we want to consider "auth mode"
as a factor here. Erring on the side of over-explaining this stuff for clarity:
There are two main authoritative mode flags in play:
(1) per-directory metastore bit that says "this directory is fully loaded into
the metastore"
(2) s3a client config bit fs.s3a.metadatastore.authoritative, which allows s3a
to short-circuit (skip) s3 on some metadata queries. This one is just a runtime
client behavior flag. You could have multiple clients with different settings
sharing a bucket. FSCK could also have a different config. I think you'll
still want some FSCK options to select the level of enforcement / paranoia as
you outline, just don't think it needs to be conflated with client's allow auth
flag. I'd imagine this as a growing set of invariant checks that can be
categorized into something like basic / paranoid / full.
Whether or not a s3a client has metadatastore.authoritative bit set in its
config doesn't really affect the contents of the metadata store or its
relationship to the underlying storage (s3) state\*. If the is_authoritative
bit is set on a directory in the metastore, however, that directory listing
from metadatastore should *match* the listing of that dir from s3. If the bit
is not set, the metastore listing should be a subset of the s3 listing.
I would also split the consistency checks into two categories:
MetadataStore-specific, and generic. Majority of the stuff here are generic
tests that work with any MetadataStore. DDB also needs to check its internal
consistency (since it uses the ancestor-exists invariant to avoid table scans).
Also agreed you'll need table scans here–but how do we expose this for FSCK
only? FSCK traditionally reaches below the FS to check its structures. (e.g.
ext3 fsck uses a block device below the ext3 fs to check on disk format,
right?).
\* some nuance here, if we want to discuss further.
was (Author: fabbri):
Thanks for your draft of FSCK requirements [[email protected]]. This is a good
start.
One thing that comes to mind: I don't know that we want to consider "auth mode"
as a factor here. Erring on the side of over-explaining this stuff for clarity:
There are two main authoritative mode flags in play:
(1) per-directory metastore bit that says "this directory is fully loaded into
the metastore"
(2) s3a client config bit fs.s3a.metadatastore.authoritative, which allows s3a
to short-circuit (skip) s3 on some metadata queries. This one is just a runtime
client behavior flag. You could have multiple clients with different settings
sharing a bucket. FSCK could also have a different config. I think you'll
still want some FSCK options to select the level of enforcement / paranoia as
you outline, just don't think it needs to be conflated with client's allow auth
flag. I'd imagine this as a growing set of invariant checks that can be
categorized into something like basic / paranoid / full.
Whether or not a s3a client has metadatastore.authoritative bit set in its
config doesn't really affect the contents of the metadata store or its
relationship to the underlying storage (s3) state**. If the is_authoritative
bit is set on a directory in the metastore, however, that directory listing
from metadatastore should *match* the listing of that dir from s3. If the bit
is not set, the metastore listing should be a subset of the s3 listing.
I would also split the consistency checks into two categories:
MetadataStore-specific, and generic. Majority of the stuff here are generic
tests that work with any MetadataStore. DDB also needs to check its internal
consistency (since it uses the ancestor-exists invariant to avoid table scans).
Also agreed you'll need table scans here–but how do we expose this for FSCK
only? FSCK traditionally reaches below the FS to check its structures. (e.g.
ext3 fsck uses a block device below the ext3 fs to check on disk format,
right?).
** some nuance here, if we want to discuss further.
> S3Guard CLI: Add fsck check command
> -----------------------------------
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.0.0-beta1
> Reporter: Aaron Fabbri
> Assignee: Gabor Bota
> Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which
> compares S3 with MetadataStore, and returns a failure status if any
> invariants are violated.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]