[ 
https://issues.apache.org/jira/browse/HADOOP-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213244#comment-15213244
 ] 

Stephen Montgomery commented on HADOOP-12963:
---------------------------------------------

Hi Steve,
Thanks for quick reply. This patch is simply setting a flag on the Amazon S3 
Client to use the path style access behaviour by default instead of virtual 
hosting - see com.amazonaws.services.s3.S3ClientOptions. This is done when the 
S3AFileSystem initialises the AmazonS3Client. JetS3t has a similar property to 
do this as well - see s3service.disable-dns-buckets at 
http://www.jets3t.org/toolkit/configuration.html. 

I submitted a test 
(org.apache.hadoop.fs.s3a.TestS3AConfiguration.shouldBeAbleToSwitchOnS3PathStyleAccessViaConfigProperty)
 that simply sets the new Hadoop flag, initialises the new S3AFileSystem and 
checks that in the newly instantiated AmazonS3Client that it's 
S3ClientOptions.isPathStyleAccess() is set to true. The S3ClientOptions 
property interrogation is done via ugly reflection as the property is not 
retrievable via Amazon S3 SDK. 

When the test runs against "live" S3A buckets, and the path style access 
switched on, the buckets have be created in the same region as the 
AmazonS3Client (with default s3.amazonaws.com endpoint specified) otherwise a 
301 error thrown (see 
http://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html) which is in 
the test as well. 

I have patched a running cluster and submitted client jobs using the new flag 
and it works as expected - it removed the need to have all the virtual hosted 
buckets specified in the /etc/hosts file. I have also done manual tests 
specifying the region.amazonaws.com as custom S3A endpoint to bypass the 301 
error when I have buckets in different regions. I also used an IPv4 address as 
the custom S3A endpoint that is a known workaround to switch on path style 
access in the AmazonS3Client code itself.

I could have written a few more tests, maybe creating new buckets on the fly in 
different regions to test for the 301 error but I don't know if this error code 
is specific to AWS S3 only (and never going to change). The actual AWS S3A 
operations behaviour doesn't vary when virtual hosting or path style access 
used. But the upshot is that I'm just setting a flag on the AmazonS3Client 
instance creation and that small 3 liner probably (!?!) doesn't warrant 100 
lines or so of junit code. If you think it does though, I'll go ahead and do 
it... 

Thanks,
Stephen

> Allow using path style addressing for accessing the s3 endpoint
> ---------------------------------------------------------------
>
>                 Key: HADOOP-12963
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12963
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 2.7.1
>            Reporter: Andrew Baptist
>            Priority: Minor
>              Labels: features
>         Attachments: HADOOP-12963-001.patch, HADOOP-12963-1.patch, 
> hdfs-8728.patch.2
>
>
> There is no ability to specify using path style access for the s3 endpoint. 
> There are numerous non-amazon implementations of storage that support the 
> amazon API's but only support path style access such as Cleversafe and Ceph. 
> Additionally in many environments it is difficult to configure DNS correctly 
> to get virtual host style addressing to work



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to