Re: [PR] HDDS-14396. [Website v2] [Docs] [Administrator Guide] Production Deployment [ozone-site]

via GitHub Mon, 19 Jan 2026 22:54:39 -0800


jojochuang commented on code in PR #260:
URL: https://github.com/apache/ozone-site/pull/260#discussion_r2706968963



##########
docs/05-administrator-guide/02-configuration/04-performance/01-placeholder.md:
##########
@@ -1,5 +1,71 @@
-# PLACEHOLDER
+---
+sidebar_label: Production Deployment
+---
 
-**TODO:** File a subtask under 
[HDDS-9859](https://issues.apache.org/jira/browse/HDDS-9859) and complete this 
page or section.
+# Production Deployment
 
-There will be multiple pages on performance under this section. Not sure what 
is required yet.
+This document provides guidance on the requirements and best practices for a 
production deployment of Apache Ozone.
+
+## Ozone Components
+
+A typical production Ozone cluster includes the following services:
+
+- **Ozone Manager (OM)**: Manages the namespace and metadata of the Ozone 
cluster. A production cluster requires 3 OM instances for high availability.
+- **Storage Container Manager (SCM)**: Manages the data nodes and pipelines. A 
production cluster requires 3 SCM instances for high availability.
+- **Datanode**: Stores the actual data in containers. A production cluster 
requires at least 3 Datanodes.
+- **Recon**: A web-based UI for monitoring and managing the Ozone cluster. A 
Recon server is strongly recommended, though not required.
+- **S3 Gateway (S3G)**: An S3-compatible gateway for accessing Ozone. Multiple 
S3 Gateway instances are strongly recommended to load balance S3 traffic.
+- **HttpFS**: An HDFS-compatible API for accessing Ozone. This is an optional 
component.

Review Comment:
   ```suggestion
   - **HttpFS**: An WebHDFS-compatible API for accessing Ozone. This is an 
optional component.
   ```



##########
docs/05-administrator-guide/02-configuration/04-performance/01-placeholder.md:
##########
@@ -1,5 +1,71 @@
-# PLACEHOLDER
+---
+sidebar_label: Production Deployment
+---
 
-**TODO:** File a subtask under 
[HDDS-9859](https://issues.apache.org/jira/browse/HDDS-9859) and complete this 
page or section.
+# Production Deployment
 
-There will be multiple pages on performance under this section. Not sure what 
is required yet.
+This document provides guidance on the requirements and best practices for a 
production deployment of Apache Ozone.
+
+## Ozone Components
+
+A typical production Ozone cluster includes the following services:
+
+- **Ozone Manager (OM)**: Manages the namespace and metadata of the Ozone 
cluster. A production cluster requires 3 OM instances for high availability.
+- **Storage Container Manager (SCM)**: Manages the data nodes and pipelines. A 
production cluster requires 3 SCM instances for high availability.
+- **Datanode**: Stores the actual data in containers. A production cluster 
requires at least 3 Datanodes.
+- **Recon**: A web-based UI for monitoring and managing the Ozone cluster. A 
Recon server is strongly recommended, though not required.
+- **S3 Gateway (S3G)**: An S3-compatible gateway for accessing Ozone. Multiple 
S3 Gateway instances are strongly recommended to load balance S3 traffic.
+- **HttpFS**: An HDFS-compatible API for accessing Ozone. This is an optional 
component.
+
+## Requirements
+
+### System Requirements
+
+- **Hardware**: Bare metal machines are recommended for optimal performance. 
Virtual machines or containers are not recommended for production deployments.
+- **Operating System**: Linux (recommended distributions: Red Hat 8/Rocky 8+, 
Ubuntu, SUSE; supported architectures: x86/ARM).
+- **Java Development Kit (JDK)**: Version 8 or higher.
+- **Time Synchronization**: A time synchronization service such as Chrony or 
ntpd must be enabled to prevent time drift.
+
+### Memory Requirements
+
+- **Ozone Manager (OM), Storage Container Manager (SCM), and Recon**: 
Recommended heap size in large production clusters is 64GB.
+- **Datanode, S3 Gateway, and HttpFS**: Recommended heap size is 31GB.
+
+### Storage Requirements
+
+- **Ozone Manager (OM), Storage Container Manager (SCM), and Recon Metadata 
Storage**: Use SAS SSD or NVMe SSD for metadata (RocksDB and Ratis) to ensure 
optimal performance. It is recommended to use RAID 1 (disk mirroring) for the 
metadata disks to protect against disk failures.
+- **Datanode Storage**:
+  - **Ratis Log**: Use SAS SSD or NVMe SSD for the Ratis log directory for low 
latency writes.
+  - **Container Data**: Hard disks are acceptable for container data storage.
+  - **Disk Configuration**: It is recommended to use a JBOD (Just a Bunch Of 
Disks) configuration instead of RAID. Ozone is a replicated distributed storage 
system and handles data redundancy. Using RAID can decrease performance without 
providing additional data protection benefits.
+- **Storage Type**: Use direct-attached storage. Do not use Network Attached 
Storage (NAS) or Storage Area Network (SAN).
+
+### Network Requirements
+
+- **Network Bandwidth**: A minimum of 25Gbps network card bandwidth is 
recommended.
+- **Network Topology**: A leaf-spine network topology with an oversubscription 
ratio below 3:1 is recommended for predictable performance.
+
+### Security Requirements (Optional but Recommended)
+
+- **Kerberos**: A Kerberos environment, including a Key Distribution Center 
(KDC), is recommended for enhanced security.
+
+## Recommended Configurations
+
+### Linux Kernel
+
+- **CPU Governor**: Set the CPU scaling driver to `performance` mode to 
maximize performance.
+- **Transparent Hugepage**: Disable Transparent Hugepage to avoid performance 
issues.
+- **SELinux**: Disable SELinux.
+- **Swappiness**: Set `vm.swappiness=1` to minimize swapping.
+
+### Local File System
+
+- **LVM**: Disable Logical Volume Manager (LVM) for data drives.
+- **File System**: Use `ext4` or `xfs` file systems.
+- **Mount Options**: Mount drives with the `noatime` option to reduce 
unnecessary disk writes. For SSDs, also add the `discard` option.
+
+### Ozone Configuration
+
+- **Monitoring**: Install Prometheus and Grafana for monitoring the Ozone 
cluster. For audit logs, consider using a log ingestion framework such as the 
ELK Stack (Elasticsearch, Logstash, and Kibana) with FileBeat, or other similar 
frameworks. Alternatively, you can use Apache Ranger to manage audit logs.
+- **Pipeline Limits**: Increase the number of allowed write pipelines to 
better suit your workload by adjusting `ozone.scm.datanode.pipeline.limit` and 
`ozone.scm.ec.pipeline.minimum`.

Review Comment:
   TODO: we don't provide the value or a formula to calculate them. We should 
refer to the Multi-Raft page where there's a formula for Ratis pipelines 
ozone.scm.datanode.pipeline.limit. But we don't describe 
ozone.scm.ec.pipeline.minimum.



##########
docs/05-administrator-guide/02-configuration/04-performance/01-placeholder.md:
##########
@@ -1,5 +1,71 @@
-# PLACEHOLDER
+---
+sidebar_label: Production Deployment
+---
 
-**TODO:** File a subtask under 
[HDDS-9859](https://issues.apache.org/jira/browse/HDDS-9859) and complete this 
page or section.
+# Production Deployment
 
-There will be multiple pages on performance under this section. Not sure what 
is required yet.
+This document provides guidance on the requirements and best practices for a 
production deployment of Apache Ozone.
+
+## Ozone Components
+
+A typical production Ozone cluster includes the following services:
+
+- **Ozone Manager (OM)**: Manages the namespace and metadata of the Ozone 
cluster. A production cluster requires 3 OM instances for high availability.
+- **Storage Container Manager (SCM)**: Manages the data nodes and pipelines. A 
production cluster requires 3 SCM instances for high availability.
+- **Datanode**: Stores the actual data in containers. A production cluster 
requires at least 3 Datanodes.
+- **Recon**: A web-based UI for monitoring and managing the Ozone cluster. A 
Recon server is strongly recommended, though not required.
+- **S3 Gateway (S3G)**: An S3-compatible gateway for accessing Ozone. Multiple 
S3 Gateway instances are strongly recommended to load balance S3 traffic.
+- **HttpFS**: An HDFS-compatible API for accessing Ozone. This is an optional 
component.
+
+## Requirements
+
+### System Requirements
+
+- **Hardware**: Bare metal machines are recommended for optimal performance. 
Virtual machines or containers are not recommended for production deployments.
+- **Operating System**: Linux (recommended distributions: Red Hat 8/Rocky 8+, 
Ubuntu, SUSE; supported architectures: x86/ARM).
+- **Java Development Kit (JDK)**: Version 8 or higher.
+- **Time Synchronization**: A time synchronization service such as Chrony or 
ntpd must be enabled to prevent time drift.
+
+### Memory Requirements
+
+- **Ozone Manager (OM), Storage Container Manager (SCM), and Recon**: 
Recommended heap size in large production clusters is 64GB.
+- **Datanode, S3 Gateway, and HttpFS**: Recommended heap size is 31GB.
+
+### Storage Requirements
+
+- **Ozone Manager (OM), Storage Container Manager (SCM), and Recon Metadata 
Storage**: Use SAS SSD or NVMe SSD for metadata (RocksDB and Ratis) to ensure 
optimal performance. It is recommended to use RAID 1 (disk mirroring) for the 
metadata disks to protect against disk failures.
+- **Datanode Storage**:
+  - **Ratis Log**: Use SAS SSD or NVMe SSD for the Ratis log directory for low 
latency writes.
+  - **Container Data**: Hard disks are acceptable for container data storage.
+  - **Disk Configuration**: It is recommended to use a JBOD (Just a Bunch Of 
Disks) configuration instead of RAID. Ozone is a replicated distributed storage 
system and handles data redundancy. Using RAID can decrease performance without 
providing additional data protection benefits.
+- **Storage Type**: Use direct-attached storage. Do not use Network Attached 
Storage (NAS) or Storage Area Network (SAN).
+
+### Network Requirements
+
+- **Network Bandwidth**: A minimum of 25Gbps network card bandwidth is 
recommended.
+- **Network Topology**: A leaf-spine network topology with an oversubscription 
ratio below 3:1 is recommended for predictable performance.
+
+### Security Requirements (Optional but Recommended)
+
+- **Kerberos**: A Kerberos environment, including a Key Distribution Center 
(KDC), is recommended for enhanced security.
+
+## Recommended Configurations
+
+### Linux Kernel
+
+- **CPU Governor**: Set the CPU scaling driver to `performance` mode to 
maximize performance.
+- **Transparent Hugepage**: Disable Transparent Hugepage to avoid performance 
issues.
+- **SELinux**: Disable SELinux.
+- **Swappiness**: Set `vm.swappiness=1` to minimize swapping.
+
+### Local File System
+
+- **LVM**: Disable Logical Volume Manager (LVM) for data drives.
+- **File System**: Use `ext4` or `xfs` file systems.
+- **Mount Options**: Mount drives with the `noatime` option to reduce 
unnecessary disk writes. For SSDs, also add the `discard` option.
+
+### Ozone Configuration
+
+- **Monitoring**: Install Prometheus and Grafana for monitoring the Ozone 
cluster. For audit logs, consider using a log ingestion framework such as the 
ELK Stack (Elasticsearch, Logstash, and Kibana) with FileBeat, or other similar 
frameworks. Alternatively, you can use Apache Ranger to manage audit logs.

Review Comment:
   Refer to Prometheus page and Grafana page.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-14396. [Website v2] [Docs] [Administrator Guide] Production Deployment [ozone-site]

Reply via email to