jojochuang commented on code in PR #313: URL: https://github.com/apache/ozone-site/pull/313#discussion_r2767649345
########## docs/05-administrator-guide/01-installation/03-hardware-and-sizing.md: ########## @@ -1,3 +1,150 @@ # Hardware and Sizing -**TODO:** File a subtask under [HDDS-9859](https://issues.apache.org/jira/browse/HDDS-9859) and complete this page or section. +This guide outlines the hardware requirements and sizing recommendations for Apache Ozone clusters of different scales. Proper hardware selection is critical for achieving optimal performance, reliability, and cost-effectiveness. + +Note: Apache Ozone can run on a single node inside Kubernetes and serve all functionality for development, testing, and small workloads. The hardware specifications in this guide reflect common configurations for production deployments. Your choice of hardware should depend on your desired scale, performance requirements, and workload characteristics. + +## Principles + +When planning an Ozone deployment, consider these key principles: + +- **Separate Metadata and Data Hardware**: Metadata services (OM, SCM) have different requirements than data services (Datanodes). +- **SSD/NVMe for Metadata**: All metadata services require fast storage for RocksDB. +- **Scale Metadata Vertically**: Add more resources to existing metadata nodes rather than more nodes. +- **Scale Datanodes Horizontally**: Add more Datanode machines as capacity and throughput needs grow. +- **Plan for Failure**: Size the cluster to handle expected failures of drives and nodes. Do not exceed 400 TB raw Datanode capacity. + +## Guidelines + +### Hardware Configuration Best Practices + +#### Drive Configuration + +- Use enterprise-class drives in production environments +- Use SAS HDD drives for data nodes +- Use SSD/NVMe optimized for mixed workloads as system drives +- Use NVMe or SAS SSD optimized for heavy write workloads for metadata and Ratis logs +- Use hardware RAID1 for system drives and metadata storage +- Leave at least 20% free space on metadata drives for RocksDB compaction +- Factor in drive failure rates (typically 1–5% annually) for capacity planning and use SMART to monitor drive health. + +#### Memory Configuration + +- Reserve at least 4-8GB for OS and other services on each node +- Budget memory to avoid swap usage but do not disable swap entirely; this prevents the OOM killer from terminating critical processes unpredictably. +- For co-located services (OM+SCM), size the heap to accommodate both services plus overhead +- Use G1GC collector for production JVMs + +#### Expansion Planning + +- Design racks with expansion capacity in mind Review Comment: If the cluster plans to store erasure coded buckets, it is highly recommended to scatter the cluster across at least (number of data blocks + number of parity blocks) of racks or more, to tolerate rack-level failures. For example, to store files in RS-6-3-1024 EC scheme, the cluster should distribute across 9 or more racks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
