[
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434139#comment-15434139
]
Aaron Fabbri commented on HADOOP-13345:
---------------------------------------
Having the MetadataStore interface is an important first step for us to
parallelize our effort here. Thanks again Chris for getting that first patch
out.
I still have questions about the subtasks though. There is still some fuzziness
with respect to the policy part. (We may want to have a conf. call to
discuss--and I'm open tomorrow.)
I've been thinking about policy a little and I believe:
- Allowing MetadataStore implementations to opt in/out of being source of truth
is important. Implementations may wish to opt out based on implementation
complexity, or lack of transactions for underlying store, or policy (LRU
discard).
- Allowing the client to opt out of relying on MetadataStore as source of truth
is also desirable. Workloads that add files outside of hadoop, for example.
And opting out is less risky while we stabilize the codebase.
This implies some configuration parameters (ignoring the naming for now--I
assume a future where this is factored out of s3a for any FS client to utilize)
fs.<client>.metadatastore.allow.authoritative
- If true, allow configured metadata store (if any) to be source of truth on
cached file metadata and directory listings.
- If true, but configured metadata store does not support being authoritative,
this setting will have no effect,
as the MetadataStore will always return results marked as non-authoritative.
fs.<client>.metadatastore.class
- Configure which MetadataStore implementation to use, if any.
- This may replace fs.s3a.s3guard.enabled proposed in doc?
fs.metadatastore.<impl-name>.fullycache.directories
- If the metadata store implementation supports being authoritative on
directory listings, this will cause it
to return DirectoryListMetadata (name tbd) results with fullyCached=true when
it has complete directory
listing.
- If metadata store implementation does not support this, it should log an
error. Client will work correctly
as implementation will never claim to fully cache listings / PathMetadata.
We could name this ...<impl-name>.authoritative.directories instead.. We could
also add an analogue for files: ..<impl-name>.authoritative.files as well. In
my prototype I assumed get() on a single Path could always be authoritative. I
could go either way.
Thoughts?
> S3Guard: Improved Consistency for S3A
> -------------------------------------
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs/s3
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch,
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf,
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a
> stronger consistency model than what is currently offered. The solution
> coordinates with a strongly consistent external store to resolve
> inconsistencies caused by the S3 eventual consistency model.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]