[jira] [Updated] (HADOOP-13336) S3A to support per-bucket configuration

Steve Loughran (JIRA) Fri, 06 Jan 2017 11:02:28 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran updated HADOOP-13336:
------------------------------------
    Attachment: HADOOP-13336-HADOOP-13345-002.patch

Patch 002. This one I like

When a new FS instance with {{uri s3a://BUCKET.whatever}} is created, the 
supplied conf is cloned, with all {{fs.s3a.bucket.BUCKET}} properties copied 
onto the base `fs.s3a`. (excluding fs.s3a.impl and any attempts to overwrite 
those with fs.s3a.bucket). 

This lets you do things like declare different endpoints for different buckets:
{code}
  <property>
    <name>fs.s3a.bucket.landsat-pds.endpoint</name>
    <value>s3.amazonaws.com</value>
    <description>The endpoint for s3a://landsat-pds URLs</description>
  </property>
{code}

It will also handle: auth mechanisms, fadvice policy, output tuning, etc, etc, 
so support: different buckets with different access accounts, remote locales, 
etc.

Test: yes, of base propagation. 
I've added an implicit one by removing the special code needed to let you 
specify a different endpoint for the test CSV file. Now, you can change the 
default fs.s3a.endpoint to somewhere like frankfurt, yet still use the landsat 
image, just by defining the new endpoint for this. 

Tested against s3a frankfurt, without the override (To verify the default 
endpoint is picked up), then again with the overridden endpoint.

Documentation. Yes, with examples covering endpoints and authentication. I also 
cut the section on CSV endpoint configuration, as its implicitly covered by the 
new stuff.

> S3A to support per-bucket configuration
> ---------------------------------------
>
>                 Key: HADOOP-13336
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13336
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13336-HADOOP-13345-001.patch, 
> HADOOP-13336-HADOOP-13345-002.patch
>
>
> S3a now supports different regions, by way of declaring the endpoint —but you 
> can't do things like read in one region, write back in another (e.g. a distcp 
> backup), because only one region can be specified in a configuration.
> If s3a supported region declaration in the URL, e.g. s3a://b1.frankfurt 
> s3a://b2.seol , then this would be possible. 
> Swift does this with a full filesystem binding/config: endpoints, username, 
> etc, in the XML file. Would we need to do that much? It'd be simpler 
> initially to use a domain suffix of a URL to set the region of a bucket from 
> the domain and have the aws library sort the details out itself, maybe with 
> some config options for working with non-AWS infra



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-13336) S3A to support per-bucket configuration

Reply via email to