[jira] [Updated] (HDDS-14389) [Website v2] [Docs] [Administrator Guide] OM HA, SCM HA failover behavior

Wei-Chiu Chuang (Jira) Fri, 23 Jan 2026 10:35:06 -0800


     [ 
https://issues.apache.org/jira/browse/HDDS-14389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wei-Chiu Chuang updated HDDS-14389:
-----------------------------------
    Description: 
Our OM HA and SCM HA doc covers how multiple OM reaches consensus using Ratis.

However, it misses another important part of story: client failover behavior.

Under normal scenario, a client always submits requests to the leader OM. If it 
doesn't know which is the leader, it will start by sending the request to the 
first OM in the configuration. It retries other OM until a leader is found.

 

HadoopRpcOMFailoverProxyProvider: if client to OM is Hadoop RPC transport. The 
failover or retry may happen if (1) the OM is not not reachable, (2) not a 
leader, or (3) is a leader but not ready to accept requests.

The failover will retry up to 500 times (ozone.client.failover.max.attempts), 
and 2 seconds between each failover retry 
(ozone.client.wait.between.retries.millis). If the OM is not aware of the 
current leader, client will try the next OM in round-robin fashion; otherwise, 
client will retry contacting the current leader.

Additionally, it is crucial to ensure clients and OM have consistent node 
mapping configurations, otherwise failover may not reach the leader OM.

GrpcOMFailoverProxyProvider: If client to OM is gRPC transport, the behavior is 
largely the same. But I don't have much experience with it so I'll just leave 
it as.

Similarly, client (client, OM or Datanode) to SCM failover is controlled by a 
series of configuration properties in SCMClientConfig: 
hdds.scmclient.rpc.timeout, hdds.scmclient.max.retry.timeout, 
hdds.scmclient.failover.max.retry, hdds.scmclient.failover.retry.interval.

 

All of this happens transparent to client applications. An exception is only 
bubbled up to client application if it exhausts all retry attempts.

 

Having these behaviors documented will help users troubleshoot problems.

  was:
Our OM HA and SCM HA doc covers how multiple OM reaches consensus using Ratis.

However, it misses another important part of story: client failover behavior.

HadoopRpcOMFailoverProxyProvider: if client to OM is Hadoop RPC transport. The 
failover or retry may happen if (1) the OM is not not reachable, (2) not a 
leader, or (3) is a leader but not ready to accept requests.

The failover will retry up to 500 times (ozone.client.failover.max.attempts), 
and 2 seconds between each failover retry 
(ozone.client.wait.between.retries.millis). If the OM is not aware of the 
current leader, client will try the next OM in round-robin fashion; otherwise, 
client will retry contacting the current leader.

Additionally, it is crucial to ensure clients and OM have consistent node 
mapping configurations, otherwise failover may not reach the leader OM.

GrpcOMFailoverProxyProvider: If client to OM is gRPC transport, the behavior is 
largely the same. But I don't have much experience with it so I'll just leave 
it as.

Similarly, client (client, OM or Datanode) to SCM failover is controlled by a 
series of configuration properties in SCMClientConfig: 
hdds.scmclient.rpc.timeout, hdds.scmclient.max.retry.timeout, 
hdds.scmclient.failover.max.retry, hdds.scmclient.failover.retry.interval.

Having these behaviors documented will help users troubleshoot problems.


> [Website v2] [Docs] [Administrator Guide] OM HA, SCM HA failover behavior
> -------------------------------------------------------------------------
>
>                 Key: HDDS-14389
>                 URL: https://issues.apache.org/jira/browse/HDDS-14389
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: documentation
>            Reporter: Wei-Chiu Chuang
>            Assignee: Gargi Jaiswal
>            Priority: Major
>
> Our OM HA and SCM HA doc covers how multiple OM reaches consensus using Ratis.
> However, it misses another important part of story: client failover behavior.
> Under normal scenario, a client always submits requests to the leader OM. If 
> it doesn't know which is the leader, it will start by sending the request to 
> the first OM in the configuration. It retries other OM until a leader is 
> found.
>  
> HadoopRpcOMFailoverProxyProvider: if client to OM is Hadoop RPC transport. 
> The failover or retry may happen if (1) the OM is not not reachable, (2) not 
> a leader, or (3) is a leader but not ready to accept requests.
> The failover will retry up to 500 times (ozone.client.failover.max.attempts), 
> and 2 seconds between each failover retry 
> (ozone.client.wait.between.retries.millis). If the OM is not aware of the 
> current leader, client will try the next OM in round-robin fashion; 
> otherwise, client will retry contacting the current leader.
> Additionally, it is crucial to ensure clients and OM have consistent node 
> mapping configurations, otherwise failover may not reach the leader OM.
> GrpcOMFailoverProxyProvider: If client to OM is gRPC transport, the behavior 
> is largely the same. But I don't have much experience with it so I'll just 
> leave it as.
> Similarly, client (client, OM or Datanode) to SCM failover is controlled by a 
> series of configuration properties in SCMClientConfig: 
> hdds.scmclient.rpc.timeout, hdds.scmclient.max.retry.timeout, 
> hdds.scmclient.failover.max.retry, hdds.scmclient.failover.retry.interval.
>  
> All of this happens transparent to client applications. An exception is only 
> bubbled up to client application if it exhausts all retry attempts.
>  
> Having these behaviors documented will help users troubleshoot problems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-14389) [Website v2] [Docs] [Administrator Guide] OM HA, SCM HA failover behavior

Reply via email to