[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

Mingliang Liu (JIRA) Thu, 05 Jan 2017 14:38:12 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802751#comment-15802751
 ]


Mingliang Liu commented on HADOOP-13345:
----------------------------------------

Hi [~eddyxu],

You're right that in the current code users have to specify the defaultFS (via 
configuration file or -fs option from command line) for operating DDB metadata 
store directly. The s3 URI is used to create AmazonS3 object along with 
credential object (for S3 and DDB).
# AmazonS3 client, which is used for detecting the bucket region, is able to 
operate any bucket and creating such object is not binding to any specific 
bucket.
# As to the credentials in URI (e.g. s3://user:pass@bucket/), they're optional 
and deprecated. This pattern is not supported in DDB. However, the 
DDBClientFactory itself uses the same {{createAWSCredentialProviderSet}} as 
S3ClientFactory does so it honors the creds in URI name. The reason it's not 
yet supported is that after {{FS#initialization}}, S3AFS has stripped the creds 
and returns the {{scheme://host}} only URI for creating a MetadataStore. One 
possible fix is to pass the name URI which contains the creds to 
S3Guard#getMetadataStore().
{code:title=S3AFileSystem#initialize()}
-      metadataStore = S3Guard.getMetadataStore(this);
+      metadataStore = S3Guard.getMetadataStore(this, name);
{code}

For command line operations, I think fs.defaultFS is a basic config for users 
and specifying s3://bucket seems not heavy. But still, we can remove this 
constraint.
# Option 1: The DDB table name has to be specified via configuration; and we 
assume the bucket name is the DDB table name if the defaultFS is not provided 
(or it's not S3). To determine the region of the bucket, we still assume the S3 
bucket (whose name is the same as DDB table name) does exist; and the 
AmazonS3.getBucketLocation will have the value.
# Option 2: The DDB table name and endpoint have to be specified via 
configuration. We can determine the DDB region by the DDB endpoint. This way, 
we don't have to know the related S3 bucket for the DDB metadata store to 
operate.

I prefer the 2nd approach. I'm not sure both of the options work but I can work 
on a wip patch recently; or as you suggested, we can support this later.

> S3Guard: Improved Consistency for S3A
> -------------------------------------
>
>                 Key: HADOOP-13345
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13345
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

Reply via email to