Re: [PR] HDDS-14389. [Website v2] [Docs] [Administrator Guide] OM HA, SCM HA failover behavior [ozone-site]

via GitHub Tue, 27 Jan 2026 09:39:18 -0800


jojochuang commented on code in PR #296:
URL: https://github.com/apache/ozone-site/pull/296#discussion_r2733106952



##########
docs/05-administrator-guide/02-configuration/06-high-availability/03-client-failover.md:
##########
@@ -0,0 +1,69 @@
+---
+sidebar_label: Client Failover
+---
+
+# HA Client Failover
+
+## Overview
+
+This document describes how Ozone clients handle failover and retry logic to 
ensure high availability and reliability. In Ozone's high availability (HA) 
setup, clients need to automatically failover between multiple service 
instances (Ozone Manager, Storage Container Manager) and retry operations when 
encountering failures with Datanodes.
+
+The failover and retry mechanisms operate transparently to client 
applications. Clients automatically detect failures, switch to alternative 
service instances, and retry operations according to configurable policies. An 
exception is only raised to the application layer after all retry attempts have 
been exhausted.
+
+## Client to Ozone Manager Failover
+
+Clients always submit requests to the leader Ozone Manager (OM). If the 
`leader` is `unknown`, clients start by sending requests to the first OM in the 
configuration and retries other OMs until a leader is found.
+
+### 1. Hadoop RPC Transport (`HadoopRpcOMFailoverProxyProvider`)
+
+If client to OM is Hadoop RPC transport(`HadoopRpcOMFailoverProxyProvider`), 
failover or retry may happen if the OM:
+
+- is not reachable,
+- is not the leader, or
+- is the leader but not ready to accept requests.
+
+The failover mechanism retries up to **500 times** 
(`ozone.client.failover.max.attempts`), with **2 seconds** between each 
failover retry (`ozone.client.wait.between.retries.millis`).
+If an OM is not aware of the current leader, the client tries the next OM in 
round-robin fashion. Otherwise, the client retries contacting the current 
leader.
+
+Additionally, it is crucial to ensure clients and OM have consistent node 
mapping configurations, otherwise failover may not reach the leader OM.
+
+### 2. gRPC Transport
+
+When using gRPC transport (`GrpcOMFailoverProxyProvider`), the failover 
behavior is similar to Hadoop RPC transport, using the same retry policies and 
configuration parameters.
+
+## Client to Storage Container Manager Failover
+
+Client (client, OM, or Datanode) to SCM failover is controlled by 
configuration properties in `SCMClientConfig`. Clients try to connect to the 
leader SCM.
+If the SCM provides a suggested leader in the exception, the client fails over 
to that leader. Otherwise, the client tries the next SCM in round-robin fashion.
+
+The failover configuration properties are:
+
+| Property | Default | Description |
+|----------|-------|-------------|
+| `hdds.scmclient.rpc.timeout` | 15min | RPC timeout for SCM. If 
`ipc.client.ping` is set to true and this RPC-timeout is greater than the value 
of `ipc.ping.interval`, the effective value of the RPC-timeout is rounded up to 
multiple of `ipc.ping.interval`. |
+| `hdds.scmclient.max.retry.timeout` | 10min | Maximum retry timeout for SCM 
Client. |
+| `hdds.scmclient.failover.max.retry` | 15    | Maximum retry count for SCM 
Client when failover happens. If `maxRetryTimeout / retryInterval` is larger 
than this value, the calculated value is used instead. |
+| `hdds.scmclient.failover.retry.interval` | 2s    | Time to wait between 
retry attempts to other SCM IP. |
+
+## Client to Datanode Failover and Retry
+
+Clients retry Datanodes in order upon failure. The retry behavior differs for 
read and write operations:

Review Comment:
   maybe a little clarification, otherwise it sounds like a client retries 
every datanode in the cluster, which is not the case. Instead, it retries with 
datanodes in the same pipeline.
   
   
   ```suggestion
   Clients retry Datanodes in the pipeline in order upon failure, in other 
words clients attempting to access a block belonging to a RATIS/3 pipeline may 
retry up to 3 datanodes. The retry behavior differs for read and write 
operations:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-14389. [Website v2] [Docs] [Administrator Guide] OM HA, SCM HA failover behavior [ozone-site]

Reply via email to