[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

Chris Nauroth (JIRA) Sat, 23 Jul 2016 14:16:09 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390848#comment-15390848
 ]


Chris Nauroth commented on HADOOP-13345:
----------------------------------------

[~liuml07], thank you for your feedback.

bq. Besides the metrics2, do you plan to support statistics (as subclass of 
StorageStatistics probably)?

I hadn't considered it, but yes, I think we can investigate adding statistics 
specific to S3Guard's implementation.

bq. As there is no limit to the number of objects that can be stored in a S3 
bucket, the S3 bucket may be very large. In this case, the consistency check 
requests may go to a single DynamoDB talbe. S3Guard may suffer from the low 
capacity units (read and write). To avoid this, the customers need to monitor 
and provision the table throughput. I suggest we consider this as the third 
"potential drawback" when using DynamoDB as a consistency store. See page 11 of 
design doc. I think using namenode should work just fine regarding the 
operation overhead.

Management of provisioned throughput is an additional source of operational 
complexity, but I failed to call that out specifically in the first revision of 
the document.  I'll add it in the next revision.

bq. fs.s3a.s3guard.fail.on.error the default value is false, which should be 
true as indicated by the config key description. I believe this is an omission.

Yes, thank you for catching it.

bq. As to the exponential backoff strategy for recheck, will the jitter be 
helpful?

Yes, FWIW, I consider jitter important enough that it should be a part of any 
exponential back-off implementation.  I tend to think of it as implicit 
whenever anyone uses the phrase "exponential back-off", but that's not 
necessarily true, so I'll state it explicitly in the next revision.

bq. I think we can also discuss on the ConsistentStore methods that a 
consistent store should implement in the design doc. Plus the DynamoDB Table 
scheme/index design. I saw in the code there is discussion about alternative 
schema ideas which is helpful.

Yes, I can fold this information into the design document.  There is a balance 
to strike as I expect some of these aspects to evolve during implementation, 
which risks invalidating an overly prescriptive upfront design document.  We'll 
figure out that balance as we go.

bq. DescendantsIterator.java claims to implement preordering depth-first 
traversal (DFS) of a path and all of its descendants recursively. The example 
given was actually a breath-first traversal (BFS). I checked the code and think 
that it did implement a BFS, which conforms with the example. I think this is 
an omission in the javadoc.

I'll need to revisit this, because I think I actually have a bug in here right 
now.  My intent was to match the iteration order as would be seen through the 
S3 object listings performed inside {{S3AFileSystem}} during recursive deletes 
and renames.  I believed matching the iteration order would make it easier to 
reason about failure modes.  However, I now realize that it's almost never 
going to match up exactly anyway, because S3 won't have a key for every 
intermediate directory, but I expect DynamoDB will.

bq. In DynamoDBConsistentStore#listChildren(), we can use key condition 
expression instead of key conditions in pathToParentEq for the query request.

Are you recommending this based on the fact that the AWS SDK JavaDocs for 
{{withKeyConditions}} describe it as a "legacy parameter", or is there 
something more to it?  This is my first time working with DynamoDB, so I'm 
learning as I go.

All of the other code suggestions look great to me.  Thanks!


> S3Guard: Improved Consistency for S3A
> -------------------------------------
>
>                 Key: HADOOP-13345
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13345
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

Reply via email to