This is an automated email from the ASF dual-hosted git repository.
weichiu pushed a commit to branch HDDS-9225-website-v2
in repository https://gitbox.apache.org/repos/asf/ozone-site.git
The following commit(s) were added to refs/heads/HDDS-9225-website-v2 by this
push:
new ae2e23044 HDDS-14505. [Website v2] [Docs] [User Guide] s3a (#300)
ae2e23044 is described below
commit ae2e23044e10ec989a8d0da698857293b6582a17
Author: KUAN-HAO HUANG <[email protected]>
AuthorDate: Sat Feb 7 07:14:35 2026 +0800
HDDS-14505. [Website v2] [Docs] [User Guide] s3a (#300)
Co-authored-by: Wei-Chiu Chuang <[email protected]>
---
docs/04-user-guide/01-client-interfaces/04-s3a.md | 184 +++++++++++++++++++++-
1 file changed, 183 insertions(+), 1 deletion(-)
diff --git a/docs/04-user-guide/01-client-interfaces/04-s3a.md
b/docs/04-user-guide/01-client-interfaces/04-s3a.md
index ccf6ae854..ddc0666d6 100644
--- a/docs/04-user-guide/01-client-interfaces/04-s3a.md
+++ b/docs/04-user-guide/01-client-interfaces/04-s3a.md
@@ -4,4 +4,186 @@ sidebar_label: s3a
# s3a and Ozone
-**TODO:** File a subtask under
[HDDS-9858](https://issues.apache.org/jira/browse/HDDS-9858) and complete this
page or section.
+Ozone exposes an **S3-compatible REST interface** via the S3 Gateway. Hadoop's
**S3A** filesystem (`s3a://`) is a cloud connector that translates the AWS S3
API into a Hadoop-compatible file system interface. Hadoop-style data analytics
tools such as Hive, Impala, and Spark can access Ozone's S3 interface using the
Hadoop S3A connector, so you can use Ozone buckets from existing Hadoop
ecosystem tools without application changes.
+
+This page explains how to configure the Hadoop S3A client to use Ozone's S3
Gateway (s3g) and provides sample commands to access Ozone s3g using s3a. For
details about the Ozone S3 Gateway itself (supported REST APIs, URL schemes,
security), see the [S3 Protocol](./03-s3/01-s3-api.md) page. For more
information about S3A, see the [official Hadoop S3A
documentation](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html).
+
+## Prerequisites
+
+- A running Ozone cluster with the **S3 Gateway** enabled. You can start a
Docker-based cluster (including S3 Gateway) as described in the [S3
Protocol](./03-s3/01-s3-api.md) documentation.
+- Ozone S3 endpoint (for example `http://localhost:9878` or a load balancer
DNS name).
+- Hadoop distribution with the **`hadoop-aws`** module available. See the
official Hadoop S3A documentation:
+ - [Hadoop-AWS: S3A client
overview](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
+ - [Connecting via
S3A](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/connecting.html)
+
+## Configuring S3A for Ozone
+
+### Enable the S3A client
+
+Ensure the `hadoop-aws` module is on the client classpath. In a typical Hadoop
installation:
+
+- Set `HADOOP_OPTIONAL_TOOLS` in `hadoop-env.sh` to include `hadoop-aws`,
**or**
+- Add a dependency on `org.apache.hadoop:hadoop-aws` with the same version as
`hadoop-common`.
+
+See the [Hadoop S3A Getting
Started](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Getting_Started)
section for details.
+
+### core-site.xml: point S3A to Ozone
+
+Add the following properties to the Hadoop configuration (for example
`core-site.xml`) so that `s3a://` URIs use the Ozone S3 Gateway instead of AWS
S3:
+
+```xml
+<property>
+ <name>fs.s3a.endpoint</name>
+ <value>http://ozone-s3g-host:9878</value>
+ <description>
+ Ozone S3 Gateway endpoint. Replace with your s3g hostname or load balancer.
+ </description>
+</property>
+
+<property>
+ <name>fs.s3a.endpoint.region</name>
+ <value>us-east-1</value>
+ <description>
+ Logical region name required by the S3A client. Ozone does not enforce
regions,
+ but this must be a valid-looking value.
+ </description>
+</property>
+
+<property>
+ <name>fs.s3a.path.style.access</name>
+ <value>true</value>
+ <description>
+ Ozone S3 Gateway defaults to path-style URLs (http://host:9878/bucket),
+ so S3A should use path-style access.
+ </description>
+</property>
+```
+
+These properties follow the official S3A connection settings in [Connecting to
an S3
store](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/connecting.html#Connection_Settings).
+
+### Recommended settings for Ozone
+
+Ozone S3 Gateway adds ETag support for S3 Multipart Upload (MPU). Object
versioning and some other S3 behaviors may still differ from AWS S3. To avoid
compatibility issues with older clients or when not using MPU, you can set
these options when using S3A with Ozone:
+
+```xml
+<property>
+ <name>fs.s3a.bucket.probe</name>
+ <value>0</value>
+ <description>
+ Disable the bucket existence probe at startup. This is the default in
recent Hadoop
+ versions and is recommended for third-party S3-compatible stores such as
Ozone.
+ </description>
+</property>
+
+<property>
+ <name>fs.s3a.change.detection.mode</name>
+ <value>none</value>
+ <description>Disable change detection; not applicable to Ozone
S3.</description>
+</property>
+```
+
+### Credentials
+
+Ozone uses the same AWS-style access key and secret key model for the S3
Gateway.
+
+- If **security is disabled**, any `AWS_ACCESS_KEY_ID` /
`AWS_SECRET_ACCESS_KEY` pair can be used.
+- If **security is enabled**, obtain a key and secret via `ozone s3 getsecret`
(Kerberos authentication is required). See the [S3 Protocol —
Security](./03-s3/01-s3-api.md#security) and [Securing
S3](./03-s3/02-securing-s3.md) sections for details.
+
+Configure S3A credentials in `core-site.xml`:
+
+```xml
+<property>
+ <name>fs.s3a.access.key</name>
+ <value>your-access-key</value>
+</property>
+
+<property>
+ <name>fs.s3a.secret.key</name>
+ <value>your-secret-key</value>
+</property>
+```
+
+Alternatively, use environment variables as documented in [Authenticating via
AWS environment
variables](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Authenticating_via_the_AWS_Environment_Variables):
+
+```bash
+export AWS_ACCESS_KEY_ID="your-access-key"
+export AWS_SECRET_ACCESS_KEY="your-secret-key"
+```
+
+:::note
+For generating and revoking Ozone S3 secrets, see the **Security** section of
the [S3 Protocol](./03-s3/01-s3-api.md#security) page.
+:::
+
+:::caution
+If the Ozone S3 Gateway is exposed over **HTTPS**, the JVM must trust the
gateway's TLS certificate. The Hadoop AWS client (`hadoop-aws`) uses the
default Java truststore; if the gateway uses a custom or internal CA, add that
CA to `JAVA_HOME/lib/security/jssecacerts` or configure the JVM truststore
accordingly. Otherwise S3A connections to the HTTPS endpoint may fail with
certificate errors.
+:::
+
+## Example: using `hadoop fs` with Ozone via S3A
+
+The examples below assume:
+
+- Ozone S3 Gateway is reachable at `http://localhost:9878`
+- `core-site.xml` is configured as above
+- An S3 bucket (for example `bucket1`) already exists (you can create it with
`aws s3api --endpoint http://localhost:9878 create-bucket --bucket bucket1`)
+
+S3A URLs use the form `s3a://<bucket>/<path>`. The bucket corresponds to an
Ozone bucket under the `/s3v` volume or a bucket link.
+
+### List objects in an Ozone S3 bucket
+
+```bash
+hadoop fs -ls s3a://bucket1/
+```
+
+### Upload a local file to Ozone using S3A
+
+```bash
+hadoop fs -put /data/local-file.txt s3a://bucket1/path/local-file.txt
+```
+
+### Download from Ozone to local or HDFS
+
+```bash
+# To local filesystem
+hadoop fs -copyToLocal s3a://bucket1/path/file.txt /tmp/from-ozone.txt
+
+# Copy to HDFS
+hadoop fs -cp s3a://bucket1/path/file.txt hdfs:///user/test/from-ozone.txt
+```
+
+### Quick test with inline configuration
+
+If you cannot modify cluster-wide `core-site.xml`, you can pass S3A options on
the command line. Replace the endpoint, bucket, and credentials with your
values:
+
+```bash
+hadoop fs \
+ -D fs.s3a.endpoint=http://localhost:9878 \
+ -D fs.s3a.endpoint.region=us-east-1 \
+ -D fs.s3a.path.style.access=true \
+ -D fs.s3a.bucket.probe=0 \
+ -D fs.s3a.change.detection.mode=none \
+ -D fs.s3a.access.key=your-access-key \
+ -D fs.s3a.secret.key=your-secret-key \
+ -ls s3a://bucket1/
+```
+
+## Example: using distcp between HDFS and Ozone
+
+You can use S3A as a source or destination for `distcp` to move data between
HDFS and Ozone. Use the same S3A configuration as above.
+
+Copy from HDFS to Ozone:
+
+```bash
+hadoop distcp hdfs:///data/source/dir s3a://bucket1/backup/dir
+```
+
+Copy from Ozone to HDFS:
+
+```bash
+hadoop distcp s3a://bucket1/backup/dir hdfs:///data/restore/dir
+```
+
+## Relation to Ozone S3 documentation
+
+This page describes using Ozone from the **Hadoop FileSystem** perspective
(S3A client). For REST API details, supported S3 operations, bucket linking,
and S3 security, see the [S3 Protocol](./03-s3/01-s3-api.md) and [Securing
S3](./03-s3/02-securing-s3.md) pages.
+
+For advanced S3A options (performance tuning, encryption, retries), refer to
the official [Hadoop S3A
documentation](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)
and its sub-pages such as
[Performance](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/performance.html)
and
[Encryption](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/encryption.html).
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]