[
https://issues.apache.org/jira/browse/HADOOP-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743567#comment-15743567
]
Aaron Fabbri commented on HADOOP-13336:
---------------------------------------
Great summary [~steve_l]. I think being backward-compatible with existing
configs and URIs in production is important. These all seem reasonable, but
URI compatibility seems to point to option A for me (if we want to keep it
simple). The annoying thing is that these are hard to change if we decide we
want a different option. Which option are you leaning towards?
{quote}
*Option A* per-bucket config.
Lets you define everything for a bucket.
Examples
s3a://olap2/data/2017 : s3a URL s3a://olap2/data/2017, with config set
fs.s3a.bucket.olap2 in configuration
s3a://landsat : s3a URL s3a://landsat, with config set fs.s3a.landsat for
anonymous credentials and no dynamo
{quote}
To avoid key space conflicts I'd suggest a prefix of
fs.s3a.bucket.<bucket-name> instead of fs.s3a.<bucket-name>. Just in case
someone has an s3 bucket named "endpoint", they'd use
{{fs.s3a.bucket.endpoint.*}} instead of conflicting with {{fs.s3a.endpoint}},
etc..
This option seems pretty straightforward. Should be backward compatible as it
requires no changes to URIs and existing default or "all bucket" config keys
continue to work the same. For grabbing config values in S3A, we'd call some
per-bucket Configuration wrapper that looks for the
fs.s3a.bucket.<bucket-name>.* key first, and if not, returns whatever is in the
non-bucket-specific config.
{quote}
*Option B* config via domain name in URL
This is what swift does: you define a domain, with the domain defining
everything.
s3a://olap2.dynamo/data/2017 with config sett fs.s3a.binding.dynamo
s3a://landsat.anon with config set fs.s3a.binding.anon for anonymous
credentials and no dynamo
{quote}
As you mention, my desire for URI backward-compatibility implies we need an
additional way to map a bucket to a domain, e.g.
{{fs.s3a.domain.bucket.my-bucket=my-domain}}. Seems a bit too complex. This
buys us the ability to share a config over some set of buckets.
Also, does this break folks who use FQDN bucket names?
{quote}
*Option C* Config via user:pass property in URL
This is a bit like Azure, where the FQDN defines the binding, and the username
defines the bucket. Here I'm proposing the ability to define a new user which
declares the binding info.
Examples
s3a://dynamo@olap2/data/2017 : s3a URL s3a://olap2/data/2017, with config set
fs.s3a.binding.dynamo
s3a://anon@landsat : s3a URL s3a://landsat, with config set fs.s3a.binding.anon
for anonymous credentials.
{quote}
Seems reasonable but the need to change URIs is unfortunate.
> S3A to support per-bucket configuration
> ---------------------------------------
>
> Key: HADOOP-13336
> URL: https://issues.apache.org/jira/browse/HADOOP-13336
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.8.0
> Reporter: Steve Loughran
>
> S3a now supports different regions, by way of declaring the endpoint —but you
> can't do things like read in one region, write back in another (e.g. a distcp
> backup), because only one region can be specified in a configuration.
> If s3a supported region declaration in the URL, e.g. s3a://b1.frankfurt
> s3a://b2.seol , then this would be possible.
> Swift does this with a full filesystem binding/config: endpoints, username,
> etc, in the XML file. Would we need to do that much? It'd be simpler
> initially to use a domain suffix of a URL to set the region of a bucket from
> the domain and have the aws library sort the details out itself, maybe with
> some config options for working with non-AWS infra
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]