[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378360#comment-15378360
 ] 

Mingliang Liu commented on HADOOP-13345:
----------------------------------------

Thanks, [~cnauroth] for the design doc and prototype patch. I like the proposal.

h6. Design doc:
# Besides the metrics2, do you plan to support statistics (as subclass of 
{{StorageStatistics}} probably)?
# As there is no limit to the number of objects that can be stored in a S3 
bucket, the S3 bucket may be very large. In this case, the consistency check 
requests may go to a single DynamoDB talbe. S3Guard may suffer from the low 
capacity units (read and write). To avoid this, the customers need to monitor 
and provision the table throughput. I suggest we consider this as the third 
"potential drawback" when using DynamoDB as a consistency store. See page 11 of 
design doc. I think using namenode should work just fine regarding the 
operation overhead.
# {{fs.s3a.s3guard.fail.on.error}} the default value is false, which should be 
true as indicated by the config key description. I believe this is an omission.
# As to the exponential back­off strategy for recheck, will the jitter be 
helpful? I referred to https://www.awsarchitectureblog.com/2015/03/backoff.html.
# I think we can also discuss on the {{ConsistentStore}} methods that a 
consistent store should implement in the design doc. Plus the DynamoDB Table 
scheme/index design. I saw in the code there is discussion about alternative 
schema ideas which is helpful.

h6. The patch:
# {{DescendantsIterator.java}} claims to implement preordering depth-first 
traversal (DFS) of a path and all of its descendants recursively. The example 
given was actually a breath-first traversal (BFS). I checked the code and think 
that it did implement a BFS, which conforms with the example. I think this is 
an omission in the javadoc.
# In {{DynamoDBConsistentStore#initTable()}}, perhaps we can call 
{{dynamodb.getTable(tableName).waitForActiveOrDelete()}} instead of sleeping 
and polling manually.
# I think we can use the DynamoDB document API (refer to 
[here|http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/document/DynamoDB.html]).
 For example, if we use the @ThreadSafe {{Table}} class, we can avoid directly 
operating the {{AmazonDynamoDBClient}} object and setting the table name for 
each request.
# For {{DynamoDBConsistentStore#get()}}, do we need to set *ConsistentRead* to 
true for the {{getItem()}} request?
# In {{DynamoDBConsistentStore#listChildren()}}, we can use key condition 
expression instead of key conditions in {{pathToParentEq}} for the query 
request.

h6. Nits:
# I understand that the {{fs.s3a.s3guard.store.table.name.prefix}} was not used 
yet in the patch.
# {{S3AFileSystem#awsConf}} can be final?
# In {{DescendantsIterator#hasNext()}} the statement {{\!(stack.isEmpty() && 
!children.hasNext());}} can be simplified as {{\!(stack.isEmpty()) || 
children.hasNext());}}. It's simpler to me.
# I may need to read the {{DescendantsIterator#move()}} and related helper 
methods carefully, but it'd be helpful if we can add some javadoc stating that 
in DynamoDB, we can not update the key schema attributes. We need to delete and 
put a new item for key changes.

Considering a dozen of TODOs in the current patch, user doc, and test, I agree 
with [[email protected]] that the work can be done in a feature branch and 
this JIRA be an umbrella JIRA for subtasks so people who are interested (like 
me) can pick up small task and contribute.


> S3Guard: Improved Consistency for S3A
> -------------------------------------
>
>                 Key: HADOOP-13345
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13345
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-13345.prototype1.patch, 
> S3GuardImprovedConsistencyforS3A.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to