[
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390848#comment-15390848
]
Chris Nauroth commented on HADOOP-13345:
----------------------------------------
[~liuml07], thank you for your feedback.
bq. Besides the metrics2, do you plan to support statistics (as subclass of
StorageStatistics probably)?
I hadn't considered it, but yes, I think we can investigate adding statistics
specific to S3Guard's implementation.
bq. As there is no limit to the number of objects that can be stored in a S3
bucket, the S3 bucket may be very large. In this case, the consistency check
requests may go to a single DynamoDB talbe. S3Guard may suffer from the low
capacity units (read and write). To avoid this, the customers need to monitor
and provision the table throughput. I suggest we consider this as the third
"potential drawback" when using DynamoDB as a consistency store. See page 11 of
design doc. I think using namenode should work just fine regarding the
operation overhead.
Management of provisioned throughput is an additional source of operational
complexity, but I failed to call that out specifically in the first revision of
the document. I'll add it in the next revision.
bq. fs.s3a.s3guard.fail.on.error the default value is false, which should be
true as indicated by the config key description. I believe this is an omission.
Yes, thank you for catching it.
bq. As to the exponential backoff strategy for recheck, will the jitter be
helpful?
Yes, FWIW, I consider jitter important enough that it should be a part of any
exponential back-off implementation. I tend to think of it as implicit
whenever anyone uses the phrase "exponential back-off", but that's not
necessarily true, so I'll state it explicitly in the next revision.
bq. I think we can also discuss on the ConsistentStore methods that a
consistent store should implement in the design doc. Plus the DynamoDB Table
scheme/index design. I saw in the code there is discussion about alternative
schema ideas which is helpful.
Yes, I can fold this information into the design document. There is a balance
to strike as I expect some of these aspects to evolve during implementation,
which risks invalidating an overly prescriptive upfront design document. We'll
figure out that balance as we go.
bq. DescendantsIterator.java claims to implement preordering depth-first
traversal (DFS) of a path and all of its descendants recursively. The example
given was actually a breath-first traversal (BFS). I checked the code and think
that it did implement a BFS, which conforms with the example. I think this is
an omission in the javadoc.
I'll need to revisit this, because I think I actually have a bug in here right
now. My intent was to match the iteration order as would be seen through the
S3 object listings performed inside {{S3AFileSystem}} during recursive deletes
and renames. I believed matching the iteration order would make it easier to
reason about failure modes. However, I now realize that it's almost never
going to match up exactly anyway, because S3 won't have a key for every
intermediate directory, but I expect DynamoDB will.
bq. In DynamoDBConsistentStore#listChildren(), we can use key condition
expression instead of key conditions in pathToParentEq for the query request.
Are you recommending this based on the fact that the AWS SDK JavaDocs for
{{withKeyConditions}} describe it as a "legacy parameter", or is there
something more to it? This is my first time working with DynamoDB, so I'm
learning as I go.
All of the other code suggestions look great to me. Thanks!
> S3Guard: Improved Consistency for S3A
> -------------------------------------
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs/s3
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch,
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf,
> s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a
> stronger consistency model than what is currently offered. The solution
> coordinates with a strongly consistent external store to resolve
> inconsistencies caused by the S3 eventual consistency model.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]