sodonnel commented on code in PR #8947: URL: https://github.com/apache/ozone/pull/8947#discussion_r2282275946
########## hadoop-hdds/docs/content/feature/Topology.md: ########## @@ -104,78 +107,53 @@ Uses an external script to resolve rack locations for IPs. **Topology Mapping Best Practices:** -* **Accuracy:** Mappings must be accurate and current. -* **Static Mapping:** Simpler for small, stable clusters; requires manual updates. -* **Dynamic Mapping:** Flexible for large/dynamic clusters. Script performance, correctness, and reliability are vital; ensure it's idempotent and handles batch lookups efficiently. - -## Pipeline Choosing Policies - -Ozone supports several policies for selecting a pipeline when placing containers. The policy for Ratis containers is configured by the property `hdds.scm.pipeline.choose.policy.impl` for SCM. The policy for EC (Erasure Coded) containers is configured by the property `hdds.scm.ec.pipeline.choose.policy.impl`. For both, the default value is `org.apache.hadoop.hdds.scm.pipeline.choose.algorithms.RandomPipelineChoosePolicy`. - -These policies help optimize for different goals such as load balancing, health, or simplicity: - -- **RandomPipelineChoosePolicy** (Default): Selects a pipeline at random from the available list, without considering utilization or health. This policy is simple and does not optimize for any particular metric. +* **Accuracy:** Mappings must be accurate and current. +* **Static Mapping:** Simpler for small, stable clusters; requires manual updates. +* **Dynamic Mapping:** Flexible for large/dynamic clusters. Script performance, correctness, and reliability are vital; ensure it's idempotent and handles batch lookups efficiently. -- **CapacityPipelineChoosePolicy**: Picks two random pipelines and selects the one with lower utilization, favoring pipelines with more available capacity and helping to balance the load across the cluster. +## Placement and Selection Policies -- **RoundRobinPipelineChoosePolicy**: Selects pipelines in a round-robin order. This policy is mainly used for debugging and testing, ensuring even distribution but not considering health or capacity. +Ozone uses three distinct types of policies to manage how and where data is written. -- **HealthyPipelineChoosePolicy**: Randomly selects pipelines but only returns a healthy one. If no healthy pipeline is found, it returns the last tried pipeline as a fallback. +### 1. Pipeline Creation Policy -These policies can be configured to suit different deployment needs and workloads. +This policy selects a set of datanodes to form a new pipeline. Its purpose is to ensure new pipelines are internally fault-tolerant by spreading their nodes across racks. This is the primary mechanism for topology awareness on the write path for open containers. -## Container Placement Policies for Replicated (RATIS) Containers +The policy is configured by the `ozone.scm.pipeline.placement.impl` property in `ozone-site.xml`. -SCM uses a pluggable policy to place additional replicas of *closed* RATIS-replicated containers. This is configured using the `ozone.scm.container.placement.impl` property in `ozone-site.xml`. Available policies are found in the `org.apache.hadoop.hdds.scm.container.placement.algorithms` package \[1, 3\]. +* **`SCMContainerPlacementRackAware` (Default)** + * **Function:** Distributes the datanodes of a pipeline across racks for fault tolerance (e.g., for a 3-node pipeline, it aims for at least two racks). Similar to HDFS placement. [1] + * **Use Cases:** Production clusters needing rack-level fault tolerance. + * **Limitations:** Designed for single-layer rack topologies (e.g., `/rack/node`). Not recommended for multi-layer hierarchies (e.g., `/dc/row/rack/node`) as it may not interpret deeper levels correctly. [1] -These policies are applied when SCM needs to re-replicate containers, such as during container balancing. +* **`SCMContainerPlacementRandom`** + * **Function:** Randomly selects healthy, available DataNodes, ignoring rack topology. [1, 4] + * **Use Cases:** Small/dev/test clusters where rack fault tolerance is not critical. -### 1. `SCMContainerPlacementRackAware` (Default) +* **`SCMContainerPlacementCapacity`** + * **Function:** Selects DataNodes by available capacity (favors lower disk utilization) to balance disk usage across the cluster. [5, 6] + * **Use Cases:** Heterogeneous storage clusters or where even disk utilization is key. -* **Function:** Distributes replicas across racks for fault tolerance (e.g., for 3 replicas, aims for at least two racks). Similar to HDFS placement. \[1] -* **Use Cases:** Production clusters needing rack-level fault tolerance. -* **Configuration:** - ```xml - <property> - <name>ozone.scm.container.placement.impl</name> - <value>org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware</value> - </property> - ``` -* **Best Practices:** Requires accurate topology mapping. -* **Limitations:** Designed for single-layer rack topologies (e.g., `/rack/node`). Not recommended for multi-layer hierarchies (e.g., `/dc/row/rack/node`) as it may not interpret deeper levels correctly. \[1] +### 2. Pipeline Selection (Load Balancing) Policy -### 2. `SCMContainerPlacementRandom` +After a pool of healthy, open, and rack-aware pipelines has been created, this policy is used to **select one** of them to handle a client's write request. Its purpose is **load balancing**, not topology awareness, as the topology has already been handled during pipeline creation. -* **Function:** Randomly selects healthy, available DataNodes meeting basic criteria (space, no existing replica), ignoring rack topology. \[1, 4\] -* **Use Cases:** Small/dev/test clusters, or if rack fault tolerance for closed replicas isn't critical. -* **Configuration:** - ```xml - <property> - <name>ozone.scm.container.placement.impl</name> - <value>org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom</value> - </property> - ``` -* **Best Practices:** Not for production needing rack failure resilience. +The policy is configured by `hdds.scm.pipeline.choose.policy.impl` in `ozone-site.xml`. -### 3. `SCMContainerPlacementCapacity` +* **`RandomPipelineChoosePolicy` (Default):** Selects a pipeline at random from the available list. This policy is simple and distributes load without considering other metrics. +* **`CapacityPipelineChoosePolicy`:** Picks two random pipelines and selects the one with lower utilization, favoring pipelines with more available capacity. +* **`RoundRobinPipelineChoosePolicy`:** Selects pipelines in a round-robin order. This is mainly for debugging and testing. +* **`HealthyPipelineChoosePolicy`:** Randomly selects pipelines but only returns a healthy one. -* **Function:** Selects DataNodes by available capacity (favors lower disk utilization) to balance disk usage. \[5, 6\] -* **Use Cases:** Heterogeneous storage clusters or where even disk utilization is key. -* **Configuration:** - ```xml - <property> - <name>ozone.scm.container.placement.impl</name> - <value>org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementCapacity</value> - </property> - ``` -* **Best Practices:** Prevents uneven node filling. -* **Interaction:** This container placement policy selects datanodes by randomly picking two nodes from a pool of healthy, available nodes and then choosing the one with lower utilization (more free space). This approach aims to distribute containers more evenly across the cluster over time, favoring less utilized nodes without overwhelming newly added nodes. +### 3. Closed Container Replication Policy +This policy is used only when SCM needs to create an **additional replica of a closed container**. This happens during re-replication (after a node failure) or container balancing. Its scope is narrow compared to the pipeline creation and selection policies. +This is configured using the `ozone.scm.container.placement.impl` property in `ozone-site.xml`. The available policies are the same as for Pipeline Creation (e.g., `SCMContainerPlacementRackAware`, `SCMContainerPlacementRandom`). Review Comment: This is correct, but I am not sure the policies mentioned above are valid for pipeline creation, in so much as they have probably never been tested. Might be best to move the list from above to here and then not mention them above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
