echonesis commented on code in PR #202: URL: https://github.com/apache/ozone-site/pull/202#discussion_r2656876602
########## docs/05-administrator-guide/02-configuration/04-performance/03-multi-raft.md: ########## @@ -0,0 +1,106 @@ +--- +sidebar_label: Multi-Raft +--- + +# Multi-Raft + +## Multi-Raft in Datanodes + +Multi-Raft support allows a Datanode to participate +multiple Ratis replication groups (pipelines) at the same time +for improving write throughput and ensuring better utilization +of disk and network resources. +This is particularly useful when Datanodes have multiple disks +or the network has a very high bandwidth. + +### Background + +The early Ozone versions supported only one Raft pipeline per Datanode. +This limited its concurrent write handling capacity for replicated data +and led to under-utilization of resources. +The use of Multi-Raft tremendously improved the resource utilization. + +## Prerequisites + +- Ozone 0.5.0 or later +- Replication type: Ratis +- Multiple metadata directories on each Datanode + (via `hdds.container.ratis.datanode.storage.dir`) + - Adequate CPU, memory, and network bandwidth + +## How It Works + +SCM can now create overlapping pipelines: +each Datanode can join multiple Raft groups +up to a configurable limit. +This boosts concurrency and avoids idle nodes and idle disks. +Raft logs are stored separately on different metadata directories +in order to reduce disk contention. +Ratis handles concurrent logs per node. + +## Configuration + +- `hdds.container.ratis.datanode.storage.dir` (no default) + - A list of metadata directory paths. +- `ozone.scm.datanode.pipeline.limit` (default: 2) + - The maximum number of pipelines per Datanode can be engaged in. + The value 0 means that the pipeline limit per Datanode will be determined + by the number of metadata disks reported per Datanode; + see the next property. +- `ozone.scm.pipeline.per.metadata.disk` (default: 2) + - The maximum number of pipelines for a Datanode is determined + by the number of disks in that Datanode. + This property is effective only when the previous property is set to 0. + The value of this property must be greater than 0. + +## How to Use + +1. Configure Datanode metadata directories: + + ```xml + <property> + <name>hdds.container.ratis.datanode.storage.dir</name> + <value>/disk1/ratis,/disk2/ratis,/disk3/ratis,/disk4/ratis</value> + </property> + ``` + +2. Set pipeline limits in `ozone-site.xml`: + + ```xml + <property> + <name>ozone.scm.datanode.pipeline.limit</name> + <value>0</value> + </property> + <property> + <name>ozone.scm.pipeline.per.metadata.disk</name> + <value>2</value> + </property> + ``` + +3. Restart SCM and Datanodes. +4. Validate with: + + ```bash + ozone admin pipeline list + ozone admin datanode list + ``` + +## Operational Tips + +- Monitor with `ozone admin` CLI and the Recon UI. +- Ensure pipeline count matches expectations. + For optimal I/O isolation, + configure `hdds.container.ratis.datanode.storage.dir` + with paths on multiple distinct physical disks. + The Ratis pipelines will be distributed accordingly. + - Be cautious with very high pipeline counts due to memory/CPU overhead. + +## Limitations + +- Global configuration: cannot set per-node limits +- Requires restart after changing storage dirs +- No effect on Erasure-Coding (EC) pipelines + +## References + +- Design doc: [HDDS-1564 Ozone multi-raft support](https://ozone.apache.org/docs/edge/design/multiraft.html) Review Comment: Thanks for pointing it out! I will update it once the design doc is migrated to Doc V2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
