[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-13345:
-----------------------------------
    Attachment: HADOOP-13345.prototype1.patch
                S3GuardImprovedConsistencyforS3A.pdf

I'm attaching a design document and a prototype patch that I've been testing.  
The design document describes the proposed solution in more detail.  To 
summarize the prototype patch:

* There is a new module: hadoop-tools/hadoop-s3guard.  It was convenient to 
structure the prototype this way, isolated in a separate module, to avoid 
conflicts with the ongoing hadoop-aws work.  I'm not certain this is the best 
permanent structure.  It might make more sense to embed it all within 
hadoop-aws.  There are pros and cons either way.
* The hadoop-s3guard module defines a {{ConsistentS3AFileSystem}} class, which 
is a subclass of {{S3AFileSystem}}.  This is the main entry point for this 
work.  It overrides several methods to coordinate with the consistent external 
store.
* {{ConsistentStore}} defines the interface expected from the strongly 
consistent store.  There is currently one implementation using DynamoDB as the 
back-end: {{DynamoDBConsistentStore}}.
* Even though it's a prototype, I wrote JavaDocs to explain what is happening 
at the code level.  I think the JavaDocs serve as a good companion piece to the 
design doc.
* For testing, I have used some Maven trickery to reuse all S3A test suites 
from hadoop-aws within hadoop-s3guard.  src/test/resources/core-site.xml 
rewires the s3a: scheme so that all tests are executing against 
{{ConsistentS3AFileSystem}}.  All of these tests are passing currently.  This 
provides basic coverage of the hadoop-s3guard code paths and confirmation that 
the work hasn't yet introduced regressions.
* Currently missing is any form of dedicated testing that specifically tries to 
trigger eventually consistent behavior and confirm that S3Guard handles it 
successfully.  This means that we don't yet have automated testing that really 
confirms S3Guard does what it needs to do.  Since eventually consistent 
behavior is non-deterministic, I think we'll need to explore mock-based 
approaches that simulate eventual consistency by returning incomplete/out-dated 
results on the first try, and then complete results on subsequent retries.  
We'll need to combine this with tests that really integrate with S3.
* Also currently missing is any form of end user documentation.
* I have marked various TODOs in the code to indicate important work items that 
remain to be done.


> S3Guard: Improved Consistency for S3A
> -------------------------------------
>
>                 Key: HADOOP-13345
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13345
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-13345.prototype1.patch, 
> S3GuardImprovedConsistencyforS3A.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to