[ https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802751#comment-15802751 ]
Mingliang Liu commented on HADOOP-13345: ---------------------------------------- Hi [~eddyxu], You're right that in the current code users have to specify the defaultFS (via configuration file or -fs option from command line) for operating DDB metadata store directly. The s3 URI is used to create AmazonS3 object along with credential object (for S3 and DDB). # AmazonS3 client, which is used for detecting the bucket region, is able to operate any bucket and creating such object is not binding to any specific bucket. # As to the credentials in URI (e.g. s3://user:pass@bucket/), they're optional and deprecated. This pattern is not supported in DDB. However, the DDBClientFactory itself uses the same {{createAWSCredentialProviderSet}} as S3ClientFactory does so it honors the creds in URI name. The reason it's not yet supported is that after {{FS#initialization}}, S3AFS has stripped the creds and returns the {{scheme://host}} only URI for creating a MetadataStore. One possible fix is to pass the name URI which contains the creds to S3Guard#getMetadataStore(). {code:title=S3AFileSystem#initialize()} - metadataStore = S3Guard.getMetadataStore(this); + metadataStore = S3Guard.getMetadataStore(this, name); {code} For command line operations, I think fs.defaultFS is a basic config for users and specifying s3://bucket seems not heavy. But still, we can remove this constraint. # Option 1: The DDB table name has to be specified via configuration; and we assume the bucket name is the DDB table name if the defaultFS is not provided (or it's not S3). To determine the region of the bucket, we still assume the S3 bucket (whose name is the same as DDB table name) does exist; and the AmazonS3.getBucketLocation will have the value. # Option 2: The DDB table name and endpoint have to be specified via configuration. We can determine the DDB region by the DDB endpoint. This way, we don't have to know the related S3 bucket for the DDB metadata store to operate. I prefer the 2nd approach. I'm not sure both of the options work but I can work on a wip patch recently; or as you suggested, we can support this later. > S3Guard: Improved Consistency for S3A > ------------------------------------- > > Key: HADOOP-13345 > URL: https://issues.apache.org/jira/browse/HADOOP-13345 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/s3 > Reporter: Chris Nauroth > Assignee: Chris Nauroth > Attachments: HADOOP-13345.prototype1.patch, > S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, > S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch > > > This issue proposes S3Guard, a new feature of S3A, to provide an option for a > stronger consistency model than what is currently offered. The solution > coordinates with a strongly consistent external store to resolve > inconsistencies caused by the S3 eventual consistency model. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org