[jira] [Commented] (HADOOP-13336) S3A to support per-bucket configuration

Aaron Fabbri (JIRA) Mon, 12 Dec 2016 15:57:07 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743567#comment-15743567
 ]


Aaron Fabbri commented on HADOOP-13336:
---------------------------------------

Great summary [~steve_l]. I think being backward-compatible with existing 
configs and URIs in production is important.  These all seem reasonable, but 
URI compatibility seems to point to option A for me (if we want to keep it 
simple).  The annoying thing is that these are hard to change if we decide we 
want a different option. Which option are you leaning towards?  

{quote}
*Option A* per-bucket config.
Lets you define everything for a bucket.
Examples
s3a://olap2/data/2017 : s3a URL s3a://olap2/data/2017, with config set 
fs.s3a.bucket.olap2 in configuration
s3a://landsat : s3a URL s3a://landsat, with config set fs.s3a.landsat for 
anonymous credentials and no dynamo
{quote}

To avoid key space conflicts I'd suggest a prefix of 
fs.s3a.bucket.<bucket-name> instead of fs.s3a.<bucket-name>. Just in case 
someone has an s3 bucket named "endpoint", they'd use 
{{fs.s3a.bucket.endpoint.*}} instead of conflicting with {{fs.s3a.endpoint}}, 
etc..

This option seems pretty straightforward.  Should be backward compatible as it 
requires no changes to URIs and existing default or "all bucket" config keys 
continue to work the same.  For grabbing config values in S3A, we'd call some 
per-bucket Configuration wrapper that looks for the 
fs.s3a.bucket.<bucket-name>.* key first, and if not, returns whatever is in the 
non-bucket-specific config.

{quote}
*Option B* config via domain name in URL
This is what swift does: you define a domain, with the domain defining 
everything.
s3a://olap2.dynamo/data/2017 with config sett fs.s3a.binding.dynamo
s3a://landsat.anon with config set fs.s3a.binding.anon for anonymous 
credentials and no dynamo
{quote}

As you mention, my desire for URI backward-compatibility implies we need an 
additional way to map a bucket to a domain, e.g. 
{{fs.s3a.domain.bucket.my-bucket=my-domain}}.  Seems a bit too complex. This 
buys us the ability to share a config over some set of buckets. 

Also, does this break folks who use FQDN bucket names?

{quote}
*Option C* Config via user:pass property in URL
This is a bit like Azure, where the FQDN defines the binding, and the username 
defines the bucket. Here I'm proposing the ability to define a new user which 
declares the binding info.
Examples
s3a://dynamo@olap2/data/2017 : s3a URL s3a://olap2/data/2017, with config set 
fs.s3a.binding.dynamo
s3a://anon@landsat : s3a URL s3a://landsat, with config set fs.s3a.binding.anon 
for anonymous credentials.
{quote}

Seems reasonable but the need to change URIs is unfortunate.




> S3A to support per-bucket configuration
> ---------------------------------------
>
>                 Key: HADOOP-13336
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13336
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>
> S3a now supports different regions, by way of declaring the endpoint —but you 
> can't do things like read in one region, write back in another (e.g. a distcp 
> backup), because only one region can be specified in a configuration.
> If s3a supported region declaration in the URL, e.g. s3a://b1.frankfurt 
> s3a://b2.seol , then this would be possible. 
> Swift does this with a full filesystem binding/config: endpoints, username, 
> etc, in the XML file. Would we need to do that much? It'd be simpler 
> initially to use a domain suffix of a URL to set the region of a bucket from 
> the domain and have the aws library sort the details out itself, maybe with 
> some config options for working with non-AWS infra



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13336) S3A to support per-bucket configuration

Reply via email to