[
https://issues.apache.org/jira/browse/HADOOP-12963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213244#comment-15213244
]
Stephen Montgomery commented on HADOOP-12963:
---------------------------------------------
Hi Steve,
Thanks for quick reply. This patch is simply setting a flag on the Amazon S3
Client to use the path style access behaviour by default instead of virtual
hosting - see com.amazonaws.services.s3.S3ClientOptions. This is done when the
S3AFileSystem initialises the AmazonS3Client. JetS3t has a similar property to
do this as well - see s3service.disable-dns-buckets at
http://www.jets3t.org/toolkit/configuration.html.
I submitted a test
(org.apache.hadoop.fs.s3a.TestS3AConfiguration.shouldBeAbleToSwitchOnS3PathStyleAccessViaConfigProperty)
that simply sets the new Hadoop flag, initialises the new S3AFileSystem and
checks that in the newly instantiated AmazonS3Client that it's
S3ClientOptions.isPathStyleAccess() is set to true. The S3ClientOptions
property interrogation is done via ugly reflection as the property is not
retrievable via Amazon S3 SDK.
When the test runs against "live" S3A buckets, and the path style access
switched on, the buckets have be created in the same region as the
AmazonS3Client (with default s3.amazonaws.com endpoint specified) otherwise a
301 error thrown (see
http://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html) which is in
the test as well.
I have patched a running cluster and submitted client jobs using the new flag
and it works as expected - it removed the need to have all the virtual hosted
buckets specified in the /etc/hosts file. I have also done manual tests
specifying the region.amazonaws.com as custom S3A endpoint to bypass the 301
error when I have buckets in different regions. I also used an IPv4 address as
the custom S3A endpoint that is a known workaround to switch on path style
access in the AmazonS3Client code itself.
I could have written a few more tests, maybe creating new buckets on the fly in
different regions to test for the 301 error but I don't know if this error code
is specific to AWS S3 only (and never going to change). The actual AWS S3A
operations behaviour doesn't vary when virtual hosting or path style access
used. But the upshot is that I'm just setting a flag on the AmazonS3Client
instance creation and that small 3 liner probably (!?!) doesn't warrant 100
lines or so of junit code. If you think it does though, I'll go ahead and do
it...
Thanks,
Stephen
> Allow using path style addressing for accessing the s3 endpoint
> ---------------------------------------------------------------
>
> Key: HADOOP-12963
> URL: https://issues.apache.org/jira/browse/HADOOP-12963
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs/s3
> Affects Versions: 2.7.1
> Reporter: Andrew Baptist
> Priority: Minor
> Labels: features
> Attachments: HADOOP-12963-001.patch, HADOOP-12963-1.patch,
> hdfs-8728.patch.2
>
>
> There is no ability to specify using path style access for the s3 endpoint.
> There are numerous non-amazon implementations of storage that support the
> amazon API's but only support path style access such as Cleversafe and Ceph.
> Additionally in many environments it is difficult to configure DNS correctly
> to get virtual host style addressing to work
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)