[
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Nauroth updated HADOOP-13345:
-----------------------------------
Attachment: HADOOP-13345.prototype1.patch
S3GuardImprovedConsistencyforS3A.pdf
I'm attaching a design document and a prototype patch that I've been testing.
The design document describes the proposed solution in more detail. To
summarize the prototype patch:
* There is a new module: hadoop-tools/hadoop-s3guard. It was convenient to
structure the prototype this way, isolated in a separate module, to avoid
conflicts with the ongoing hadoop-aws work. I'm not certain this is the best
permanent structure. It might make more sense to embed it all within
hadoop-aws. There are pros and cons either way.
* The hadoop-s3guard module defines a {{ConsistentS3AFileSystem}} class, which
is a subclass of {{S3AFileSystem}}. This is the main entry point for this
work. It overrides several methods to coordinate with the consistent external
store.
* {{ConsistentStore}} defines the interface expected from the strongly
consistent store. There is currently one implementation using DynamoDB as the
back-end: {{DynamoDBConsistentStore}}.
* Even though it's a prototype, I wrote JavaDocs to explain what is happening
at the code level. I think the JavaDocs serve as a good companion piece to the
design doc.
* For testing, I have used some Maven trickery to reuse all S3A test suites
from hadoop-aws within hadoop-s3guard. src/test/resources/core-site.xml
rewires the s3a: scheme so that all tests are executing against
{{ConsistentS3AFileSystem}}. All of these tests are passing currently. This
provides basic coverage of the hadoop-s3guard code paths and confirmation that
the work hasn't yet introduced regressions.
* Currently missing is any form of dedicated testing that specifically tries to
trigger eventually consistent behavior and confirm that S3Guard handles it
successfully. This means that we don't yet have automated testing that really
confirms S3Guard does what it needs to do. Since eventually consistent
behavior is non-deterministic, I think we'll need to explore mock-based
approaches that simulate eventual consistency by returning incomplete/out-dated
results on the first try, and then complete results on subsequent retries.
We'll need to combine this with tests that really integrate with S3.
* Also currently missing is any form of end user documentation.
* I have marked various TODOs in the code to indicate important work items that
remain to be done.
> S3Guard: Improved Consistency for S3A
> -------------------------------------
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs/s3
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch,
> S3GuardImprovedConsistencyforS3A.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a
> stronger consistency model than what is currently offered. The solution
> coordinates with a strongly consistent external store to resolve
> inconsistencies caused by the S3 eventual consistency model.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]