jojochuang commented on code in PR #296: URL: https://github.com/apache/ozone-site/pull/296#discussion_r2733106952
########## docs/05-administrator-guide/02-configuration/06-high-availability/03-client-failover.md: ########## @@ -0,0 +1,69 @@ +--- +sidebar_label: Client Failover +--- + +# HA Client Failover + +## Overview + +This document describes how Ozone clients handle failover and retry logic to ensure high availability and reliability. In Ozone's high availability (HA) setup, clients need to automatically failover between multiple service instances (Ozone Manager, Storage Container Manager) and retry operations when encountering failures with Datanodes. + +The failover and retry mechanisms operate transparently to client applications. Clients automatically detect failures, switch to alternative service instances, and retry operations according to configurable policies. An exception is only raised to the application layer after all retry attempts have been exhausted. + +## Client to Ozone Manager Failover + +Clients always submit requests to the leader Ozone Manager (OM). If the `leader` is `unknown`, clients start by sending requests to the first OM in the configuration and retries other OMs until a leader is found. + +### 1. Hadoop RPC Transport (`HadoopRpcOMFailoverProxyProvider`) + +If client to OM is Hadoop RPC transport(`HadoopRpcOMFailoverProxyProvider`), failover or retry may happen if the OM: + +- is not reachable, +- is not the leader, or +- is the leader but not ready to accept requests. + +The failover mechanism retries up to **500 times** (`ozone.client.failover.max.attempts`), with **2 seconds** between each failover retry (`ozone.client.wait.between.retries.millis`). +If an OM is not aware of the current leader, the client tries the next OM in round-robin fashion. Otherwise, the client retries contacting the current leader. + +Additionally, it is crucial to ensure clients and OM have consistent node mapping configurations, otherwise failover may not reach the leader OM. + +### 2. gRPC Transport + +When using gRPC transport (`GrpcOMFailoverProxyProvider`), the failover behavior is similar to Hadoop RPC transport, using the same retry policies and configuration parameters. + +## Client to Storage Container Manager Failover + +Client (client, OM, or Datanode) to SCM failover is controlled by configuration properties in `SCMClientConfig`. Clients try to connect to the leader SCM. +If the SCM provides a suggested leader in the exception, the client fails over to that leader. Otherwise, the client tries the next SCM in round-robin fashion. + +The failover configuration properties are: + +| Property | Default | Description | +|----------|-------|-------------| +| `hdds.scmclient.rpc.timeout` | 15min | RPC timeout for SCM. If `ipc.client.ping` is set to true and this RPC-timeout is greater than the value of `ipc.ping.interval`, the effective value of the RPC-timeout is rounded up to multiple of `ipc.ping.interval`. | +| `hdds.scmclient.max.retry.timeout` | 10min | Maximum retry timeout for SCM Client. | +| `hdds.scmclient.failover.max.retry` | 15 | Maximum retry count for SCM Client when failover happens. If `maxRetryTimeout / retryInterval` is larger than this value, the calculated value is used instead. | +| `hdds.scmclient.failover.retry.interval` | 2s | Time to wait between retry attempts to other SCM IP. | + +## Client to Datanode Failover and Retry + +Clients retry Datanodes in order upon failure. The retry behavior differs for read and write operations: Review Comment: maybe a little clarification, otherwise it sounds like a client retries every datanode in the cluster, which is not the case. Instead, it retries with datanodes in the same pipeline. ```suggestion Clients retry Datanodes in the pipeline in order upon failure, in other words clients attempting to access a block belonging to a RATIS/3 pipeline may retry up to 3 datanodes. The retry behavior differs for read and write operations: ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
