sodonnel commented on code in PR #8947: URL: https://github.com/apache/ozone/pull/8947#discussion_r2282269603
########## hadoop-hdds/docs/content/feature/Topology.md: ########## @@ -104,78 +107,53 @@ Uses an external script to resolve rack locations for IPs. **Topology Mapping Best Practices:** -* **Accuracy:** Mappings must be accurate and current. -* **Static Mapping:** Simpler for small, stable clusters; requires manual updates. -* **Dynamic Mapping:** Flexible for large/dynamic clusters. Script performance, correctness, and reliability are vital; ensure it's idempotent and handles batch lookups efficiently. - -## Pipeline Choosing Policies - -Ozone supports several policies for selecting a pipeline when placing containers. The policy for Ratis containers is configured by the property `hdds.scm.pipeline.choose.policy.impl` for SCM. The policy for EC (Erasure Coded) containers is configured by the property `hdds.scm.ec.pipeline.choose.policy.impl`. For both, the default value is `org.apache.hadoop.hdds.scm.pipeline.choose.algorithms.RandomPipelineChoosePolicy`. - -These policies help optimize for different goals such as load balancing, health, or simplicity: - -- **RandomPipelineChoosePolicy** (Default): Selects a pipeline at random from the available list, without considering utilization or health. This policy is simple and does not optimize for any particular metric. +* **Accuracy:** Mappings must be accurate and current. +* **Static Mapping:** Simpler for small, stable clusters; requires manual updates. +* **Dynamic Mapping:** Flexible for large/dynamic clusters. Script performance, correctness, and reliability are vital; ensure it's idempotent and handles batch lookups efficiently. -- **CapacityPipelineChoosePolicy**: Picks two random pipelines and selects the one with lower utilization, favoring pipelines with more available capacity and helping to balance the load across the cluster. +## Placement and Selection Policies -- **RoundRobinPipelineChoosePolicy**: Selects pipelines in a round-robin order. This policy is mainly used for debugging and testing, ensuring even distribution but not considering health or capacity. +Ozone uses three distinct types of policies to manage how and where data is written. -- **HealthyPipelineChoosePolicy**: Randomly selects pipelines but only returns a healthy one. If no healthy pipeline is found, it returns the last tried pipeline as a fallback. +### 1. Pipeline Creation Policy -These policies can be configured to suit different deployment needs and workloads. +This policy selects a set of datanodes to form a new pipeline. Its purpose is to ensure new pipelines are internally fault-tolerant by spreading their nodes across racks. This is the primary mechanism for topology awareness on the write path for open containers. -## Container Placement Policies for Replicated (RATIS) Containers +The policy is configured by the `ozone.scm.pipeline.placement.impl` property in `ozone-site.xml`. -SCM uses a pluggable policy to place additional replicas of *closed* RATIS-replicated containers. This is configured using the `ozone.scm.container.placement.impl` property in `ozone-site.xml`. Available policies are found in the `org.apache.hadoop.hdds.scm.container.placement.algorithms` package \[1, 3\]. +* **`SCMContainerPlacementRackAware` (Default)** Review Comment: I am also unsure if any other policy has been tested for use on pipeline creation, as the other policies won't take the number of pipelines per node into account. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
