[ 
https://issues.apache.org/jira/browse/HADOOP-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HADOOP-16950:
------------------------------------
    Labels: Endpoint ceph pull-request-available  (was: Endpoint ceph)

> Extend Hadoop S3a access from single endpoint to multiple endpoints
> -------------------------------------------------------------------
>
>                 Key: HADOOP-16950
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16950
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 3.1.3
>            Reporter: Ocean Lua
>            Priority: Major
>              Labels: Endpoint, ceph, pull-request-available
>         Attachments: HADOOP-16950-001.patch
>
>
> The client API of Hadoop aws can only support a single endpoint to access. 
> However, there are multiple endpoints in object storage (such as ceph), and 
> therefore the storage resources could not be fully used. To address the 
> issue, we create a new Implementation of S3AFileSystem, which support 
> multi-endpoint access. After the optimization, system performance will 
> increase significantly.
> Usage:
>  1.Ensure hadoop-aws API availiable.
>  2.Copy hadoop-aws-3.1.3.jar and aws-java-sdk-bundle-1.11.271.jar to 
> directory share/hadoop/common/lib in hadoop (hadoop-aws-3.1.3.jar and 
> aws-java-sdk-bundle-1.11.271.jar are normally located at directory 
> share/hadoop/tools/lib).
>  3.In file etc/hadoop/hadoop-env.sh, add the following:
>  export HADOOP_CLASSPATH=/(hadoop root 
> directory)/share/hadoop/common/lib/hadoop-aws-3.1.3.jar:/(hadoop root 
> directory)/share/hadoop/common/lib/hadoop-aws-3.1.3.jar:$HADOOP_CLASSPATH
>  4.Edit configuration file "core-site.xml" and set properties below:
>  <property>
>  <name>fs.s3a.s3.client.factory.impl</name>
>  <value>org.apache.hadoop.fs.s3a.MultiAddrS3ClientFactory</value>
>  </property>
>  <property>
>  <name>fs.s3a.endpoint</name>
>  
> <value>[http://addr1:port1,http://addr2:port2|http://addr1:port1%2Chttp//addr2:port2],...</value>
>  </property>
>  5.Optional configuration in "core-site.xml":
>  <property>
>  <name>fs.s3a.S3ClientSelector.class</name>
>  <value>org.apache.hadoop.fs.s3a.RandomS3ClientSelector</value>
>  </property>
>  This configuration is used to set the s3a service selection policy. The 
> default value is org.apache.hadoop.fs.s3a.RandomS3ClientSelector, which is a 
> completely random selector. The configuration can be set to 
> org.apache.hadoop.fs.s3a.PathS3ClientSelector, which is a selector according 
> to the file path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to