This is an automated email from the ASF dual-hosted git repository.
weichiu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/ozone-site.git
The following commit(s) were added to refs/heads/master by this push:
new edf2e495f HDDS-14193. [Docs] Multi-cluster. (#346)
edf2e495f is described below
commit edf2e495fcf2db5727cc90788c7b6041dac01a24
Author: Wei-Chiu Chuang <[email protected]>
AuthorDate: Wed Mar 11 08:26:47 2026 -0700
HDDS-14193. [Docs] Multi-cluster. (#346)
---
.../02-configuration/07-cluster-architectures.mdx | 125 +++++++++++++++++++++
static/img/OzoneClusterArchitectures.png | Bin 0 -> 6946547 bytes
2 files changed, 125 insertions(+)
diff --git
a/docs/05-administrator-guide/02-configuration/07-cluster-architectures.mdx
b/docs/05-administrator-guide/02-configuration/07-cluster-architectures.mdx
new file mode 100644
index 000000000..0005f48f6
--- /dev/null
+++ b/docs/05-administrator-guide/02-configuration/07-cluster-architectures.mdx
@@ -0,0 +1,125 @@
+---
+title: Cluster Architectures
+sidebar_label: Cluster Architectures
+---
+
+# Ozone Deployment Architectures
+
+This document outlines different Ozone deployment architectures, from
single-cluster setups to multi-cluster and federated configurations. It also
provides the necessary client and service configurations for these advanced
setups.
+
+The following figure illustrates the four primary deployment architectures for
Ozone. Each is described in more detail in the sections below.
+
+
+
+### 1. Minimalist (Non-HA)
+
+* 1 Ozone Manager (OM)
+* 1 Storage Container Manager (SCM)
+* 3 Datanodes (DNs)
+* **Topology:** Single cluster, no high availability.
+* **Use Case:** Recommended for local testing, development, or small-scale,
non-critical environments.
+* **Example:**
[docker-compose.yaml](https://github.com/apache/ozone/blob/master/hadoop-ozone/dist/src/main/compose/ozone/docker-compose.yaml)
+
+### 2. HA Cluster
+
+* 3 Ozone Managers (OMs)
+* 3 Storage Container Managers (SCMs)
+* 3+ Datanodes (DNs)
+* **Topology:** Single cluster, highly available.
+* **Use Case:** The standard architecture for most production deployments,
providing resilience against single-point-of-failure.
+* **Example:** [docker-compose.yaml for
HA](https://github.com/apache/ozone/blob/master/hadoop-ozone/dist/src/main/compose/ozone-ha/docker-compose.yaml)
+
+### 3. Multi-Cluster
+
+* **Topology:** Two or more completely separate HA clusters. Each cluster
has its own set of OMs, SCMs, and DNs.
+* **Use Case:** Provides full physical and logical isolation between
clusters, ideal for separating different environments (e.g., dev and prod) or
different user groups with distinct storage and control planes.
+
+#### Multi-Cluster Client Configuration
+
+For a client to interact with multiple distinct clusters, its configuration
must specify the service IDs for each Ozone Manager service.
+
+The following properties are set in the client's `ozone-site.xml`:
+
+```xml
+ <property>
+ <name>ozone.om.service.ids</name>
+ <value>ozone1,ozone2</value>
+ <tag>OM, HA</tag>
+ <description>
+ A comma-separated list of all OM service IDs the client may need to
+ contact. This allows the client to locate different Ozone clusters.
+ </description>
+ </property>
+ <property>
+ <name>ozone.om.internal.service.id</name>
+ <value>ozone1</value>
+ <tag>OM, HA</tag>
+ <description>
+ The default OM service ID for this client. If not specified, the client
+ may need to explicitly reference a service ID for operations.
+ </description>
+ </property>
+```
+
+With this configuration, the client is aware of two clusters, `ozone1` and
`ozone2`, and will use `ozone1` by default.
+
+To direct a CLI command to a specific cluster, use the appropriate service ID
parameter.
+
+**Example (SCM):** List SCM roles for a specific SCM service.
+```bash
+ozone admin scm roles --service-id=<scm_service_id>
+```
+
+**Example (OM):** List OM roles for a specific OM service.
+```bash
+ozone admin om roles -id=<om_service_id>
+```
+
+#### Application Job Configuration (e.g., Spark)
+
+When running application jobs, such as Spark, in a multi-cluster environment,
additional parameters are required to access remote Ozone clusters.
+
+To run a Spark shell job that accesses a remote cluster (e.g., `ozone2`), you
must specify the filesystem path in the `spark.yarn.access.hadoopFileSystems`
property:
+
+```bash
+spark-shell
+ --conf "spark.yarn.access.hadoopFileSystems=ofs://ozone2"
+```
+
+In a Kerberos-enabled environment, YARN might incorrectly try to manage
delegation tokens for the remote Ozone filesystem, causing jobs to fail with a
token renewal error.
+
+```bash
+# Example token renewal error
+24/02/08 01:24:30 ERROR repl.Main: Failed to initialize Spark session.
+org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit
application_1707350431298_0007 to YARN : Failed to renew token: ...
+```
+
+To prevent this, you must tell YARN to exclude the remote filesystem from its
token renewal process. A complete Spark shell command for accessing a remote,
Kerberized cluster would include both properties:
+
+```bash
+spark-shell
+ --conf "spark.yarn.access.hadoopFileSystems=ofs://ozone2"
+ --conf "spark.yarn.kerberos.renewal.excludeHadoopFileSystems=ofs://ozone2"
+```
+
+### 4. Federated Cluster
+
+* **Topology:** Multiple OM services (managing distinct namespaces) share a
single, common SCM service and a common pool of Datanodes.
+* **Use Case:** Provides separation of metadata and authority at the
namespace level while managing storage as a single, large-scale resource pool.
+
+#### Federation Configuration
+
+In a federated setup, all OMs and Datanodes must be configured to communicate
with the same shared SCM service. This is achieved by setting the
`ozone.scm.service.ids` property in the `ozone-site.xml` of each OM and
Datanode.
+
+```xml
+ <property>
+ <name>ozone.scm.service.ids</name>
+ <value>scm-federation</value>
+ <tag>OZONE, SCM, HA</tag>
+ <description>
+ A comma-separated list of SCM service IDs. In a federated cluster,
+ this should point all OMs and Datanodes to the same SCM service
+ to enable the shared storage pool.
+ </description>
+ </property>
+```
diff --git a/static/img/OzoneClusterArchitectures.png
b/static/img/OzoneClusterArchitectures.png
new file mode 100644
index 000000000..7ae2989a7
Binary files /dev/null and b/static/img/OzoneClusterArchitectures.png differ
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]