yandrey321 commented on code in PR #300: URL: https://github.com/apache/ozone-site/pull/300#discussion_r2775554601
########## docs/04-user-guide/01-client-interfaces/04-s3a.md: ########## @@ -4,4 +4,186 @@ sidebar_label: s3a # s3a and Ozone -**TODO:** File a subtask under [HDDS-9858](https://issues.apache.org/jira/browse/HDDS-9858) and complete this page or section. +Ozone exposes an **S3-compatible REST interface** via the S3 Gateway. Hadoop's **S3A** filesystem (`s3a://`) is a cloud connector that translates the AWS S3 API into a Hadoop-compatible file system interface. Hadoop-style data analytics tools such as Hive, Impala, and Spark can access Ozone's S3 interface using the Hadoop S3A connector, so you can use Ozone buckets from existing Hadoop ecosystem tools without application changes. + +This page explains how to configure the Hadoop S3A client to use Ozone's S3 Gateway (s3g) and provides sample commands to access Ozone s3g using s3a. For details about the Ozone S3 Gateway itself (supported REST APIs, URL schemes, security), see the [S3 Protocol](./03-s3/01-s3-api.md) page. For more information about S3A, see the [official Hadoop S3A documentation](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html). + +## Prerequisites + +- A running Ozone cluster with the **S3 Gateway** enabled. You can start a Docker-based cluster (including S3 Gateway) as described in the [S3 Protocol](./03-s3/01-s3-api.md) documentation. +- Ozone S3 endpoint (for example `http://localhost:9878` or a load balancer DNS name). +- Hadoop distribution with the **`hadoop-aws`** module available. See the official Hadoop S3A documentation: + - [Hadoop-AWS: S3A client overview](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html) + - [Connecting via S3A](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/connecting.html) + +## Configuring S3A for Ozone + +### Enable the S3A client + +Ensure the `hadoop-aws` module is on the client classpath. In a typical Hadoop installation: + +- Set `HADOOP_OPTIONAL_TOOLS` in `hadoop-env.sh` to include `hadoop-aws`, **or** +- Add a dependency on `org.apache.hadoop:hadoop-aws` with the same version as `hadoop-common`. + +See the [Hadoop S3A Getting Started](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Getting_Started) section for details. + +### core-site.xml: point S3A to Ozone + +Add the following properties to the Hadoop configuration (for example `core-site.xml`) so that `s3a://` URIs use the Ozone S3 Gateway instead of AWS S3: + +```xml +<property> + <name>fs.s3a.endpoint</name> + <value>http://ozone-s3g-host:9878</value> + <description> + Ozone S3 Gateway endpoint. Replace with your s3g hostname or load balancer. + </description> +</property> + +<property> + <name>fs.s3a.endpoint.region</name> + <value>us-east-1</value> + <description> + Logical region name required by the S3A client. Ozone does not enforce regions, + but this must be a valid-looking value. + </description> +</property> + +<property> + <name>fs.s3a.path.style.access</name> + <value>true</value> + <description> + Ozone S3 Gateway defaults to path-style URLs (http://host:9878/bucket), + so S3A should use path-style access. + </description> +</property> +``` + +These properties follow the official S3A connection settings in [Connecting to an S3 store](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/connecting.html#Connection_Settings). + +### Recommended settings for Ozone + +Ozone S3 Gateway adds ETag support for S3 Multipart Upload (MPU). Object versioning and some other S3 behaviors may still differ from AWS S3. To avoid compatibility issues with older clients or when not using MPU, you can set these options when using S3A with Ozone: Review Comment: it would be nice to describe the difference in Ozone's behavior comparing to AWS S3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
