[jira] [Updated] (HDDS-14389) [Doc] OM HA, SCM HA failover behavior

Wei-Chiu Chuang (Jira) Thu, 08 Jan 2026 23:53:39 -0800


     [ 
https://issues.apache.org/jira/browse/HDDS-14389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wei-Chiu Chuang updated HDDS-14389:
-----------------------------------
    Description: 
Our OM HA and SCM HA doc covers how multiple OM reaches consensus using Ratis.

However, it misses another important part of story: client failover behavior.

HadoopRpcOMFailoverProxyProvider: if client to OM is Hadoop RPC transport. The 
failover or retry may happen if (1) the OM is not not reachable, (2) not a 
leader, or (3) is a leader but not ready to accept requests.

The failover will retry up to 500 times (ozone.client.failover.max.attempts), 
and 2 seconds between each failover retry 
(ozone.client.wait.between.retries.millis). If the OM is not aware of the 
current leader, client will try the next OM in round-robin fashion; otherwise, 
client will retry contacting the current leader.

Additionally, it is crucial to ensure clients and OM have consistent node 
mapping configurations, otherwise failover may not reach the leader OM.

GrpcOMFailoverProxyProvider: If client to OM is gRPC transport, the behavior is 
largely the same. But I don't have much experience with it so I'll just leave 
it as.

Similarly, client (client, OM or Datanode) to SCM failover is controlled by a 
series of configuration properties in SCMClientConfig: 
hdds.scmclient.rpc.timeout, hdds.scmclient.max.retry.timeout, 
hdds.scmclient.failover.max.retry, hdds.scmclient.failover.retry.interval.

Having these behaviors documented will help users troubleshoot problems.

  was:
Our OM HA and SCM HA doc covers how multiple OM reaches consensus using Ratis.

However, it misses another important part of story: client failover behavior.

HadoopRpcOMFailoverProxyProvider: if client to OM is Hadoop RPC transport. The 
failover or retry may happen if (1) the OM is not not reachable, (2) not a 
leader, or (3) is a leader but not ready to accept requests.

The failover will retry up to 500 times (ozone.client.failover.max.attempts), 
and 2 seconds between each failover retry 
(ozone.client.wait.between.retries.millis). If the OM is not aware of the 
current leader, client will try the next OM in round-robin fashion; otherwise, 
client will retry contacting the current leader.

GrpcOMFailoverProxyProvider: If client to OM is gRPC transport, the behavior is 
largely the same. But I don't have much experience with it so I'll just leave 
it as.

Similarly, client (client, OM or Datanode) to SCM failover is controlled by a 
series of configuration properties in SCMClientConfig: 
hdds.scmclient.rpc.timeout, hdds.scmclient.max.retry.timeout, 
hdds.scmclient.failover.max.retry, hdds.scmclient.failover.retry.interval.

Having these behaviors documented will help users troubleshoot problems.


> [Doc] OM HA, SCM HA failover behavior
> -------------------------------------
>
>                 Key: HDDS-14389
>                 URL: https://issues.apache.org/jira/browse/HDDS-14389
>             Project: Apache Ozone
>          Issue Type: Task
>          Components: documentation
>            Reporter: Wei-Chiu Chuang
>            Priority: Major
>
> Our OM HA and SCM HA doc covers how multiple OM reaches consensus using Ratis.
> However, it misses another important part of story: client failover behavior.
> HadoopRpcOMFailoverProxyProvider: if client to OM is Hadoop RPC transport. 
> The failover or retry may happen if (1) the OM is not not reachable, (2) not 
> a leader, or (3) is a leader but not ready to accept requests.
> The failover will retry up to 500 times (ozone.client.failover.max.attempts), 
> and 2 seconds between each failover retry 
> (ozone.client.wait.between.retries.millis). If the OM is not aware of the 
> current leader, client will try the next OM in round-robin fashion; 
> otherwise, client will retry contacting the current leader.
> Additionally, it is crucial to ensure clients and OM have consistent node 
> mapping configurations, otherwise failover may not reach the leader OM.
> GrpcOMFailoverProxyProvider: If client to OM is gRPC transport, the behavior 
> is largely the same. But I don't have much experience with it so I'll just 
> leave it as.
> Similarly, client (client, OM or Datanode) to SCM failover is controlled by a 
> series of configuration properties in SCMClientConfig: 
> hdds.scmclient.rpc.timeout, hdds.scmclient.max.retry.timeout, 
> hdds.scmclient.failover.max.retry, hdds.scmclient.failover.retry.interval.
> Having these behaviors documented will help users troubleshoot problems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-14389) [Doc] OM HA, SCM HA failover behavior

Reply via email to