[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434139#comment-15434139
 ] 

Aaron Fabbri commented on HADOOP-13345:
---------------------------------------

Having the MetadataStore interface is an important first step for us to 
parallelize our effort here.  Thanks again Chris for getting that first patch 
out.  

I still have questions about the subtasks though. There is still some fuzziness 
with respect to the policy part.  (We may want to have a conf. call to 
discuss--and I'm open tomorrow.)

I've been thinking about policy a little and I believe:

- Allowing MetadataStore implementations to opt in/out of being source of truth 
is important.  Implementations may wish to opt out based on implementation 
complexity, or lack of transactions for underlying store, or policy (LRU 
discard).

- Allowing the client to opt out of relying on MetadataStore as source of truth 
is also desirable.  Workloads that add files outside of hadoop, for example.  
And opting out is less risky while we stabilize the codebase.

This implies some configuration parameters (ignoring the naming for now--I 
assume a future where this is factored out of s3a for any FS client to utilize)

fs.<client>.metadatastore.allow.authoritative
- If true, allow configured metadata store (if any) to be source of truth on 
cached file metadata and directory listings.
- If true, but configured metadata store does not support being authoritative, 
this setting will have no effect,
  as the MetadataStore will always return results marked as non-authoritative.

fs.<client>.metadatastore.class
- Configure which MetadataStore implementation to use, if any.
- This may replace fs.s3a.s3guard.enabled proposed in doc?

fs.metadatastore.<impl-name>.fullycache.directories
- If the metadata store implementation supports being authoritative on 
directory listings, this will cause it 
  to return DirectoryListMetadata (name tbd) results with fullyCached=true when 
it has complete directory 
  listing.
- If metadata store implementation does not support this, it should log an 
error.  Client will work correctly
  as implementation will never claim to fully cache listings / PathMetadata.

We could name this ...<impl-name>.authoritative.directories instead.. We could 
also add an analogue for files:  ..<impl-name>.authoritative.files as well.  In 
my prototype I assumed get() on a single Path could always be authoritative.  I 
could go either way.

Thoughts?

> S3Guard: Improved Consistency for S3A
> -------------------------------------
>
>                 Key: HADOOP-13345
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13345
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to