This is an automated email from the ASF dual-hosted git repository.
jojochuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/ozone-site.git
The following commit(s) were added to refs/heads/master by this push:
new 8f7a68c71 HDDS-15206. Document the OM follower read feature (Phase 1)
(#428)
8f7a68c71 is described below
commit 8f7a68c71fe959f51c0360c2128ac889d3048c68
Author: Ivan Andika <[email protected]>
AuthorDate: Wed May 20 12:37:23 2026 +0800
HDDS-15206. Document the OM follower read feature (Phase 1) (#428)
---
.../05-high-availability/02-om-ha.md | 1 +
.../05-high-availability/03-client-failover.md | 2 +-
.../05-high-availability/05-om-follower-read.md | 185 +++++++++++++++++++++
3 files changed, 187 insertions(+), 1 deletion(-)
diff --git
a/docs/05-administrator-guide/02-configuration/05-high-availability/02-om-ha.md
b/docs/05-administrator-guide/02-configuration/05-high-availability/02-om-ha.md
index 705a309cb..8a16b059e 100644
---
a/docs/05-administrator-guide/02-configuration/05-high-availability/02-om-ha.md
+++
b/docs/05-administrator-guide/02-configuration/05-high-availability/02-om-ha.md
@@ -169,6 +169,7 @@ host3 | om3 | FOLLOWER
- [Ozone Manager High
Availability](../../../core-concepts/high-availability/om-ha) - Conceptual
overview
- [Listener OM](./listener-om) - Read-only OM nodes for scaling reads
+- [OM Follower Read](./om-follower-read) - Client and OM settings for serving
eligible OM reads from followers
- [SCM HA Configuration](./scm-ha) - Storage Container Manager High
Availability configuration
<!--
diff --git
a/docs/05-administrator-guide/02-configuration/05-high-availability/03-client-failover.md
b/docs/05-administrator-guide/02-configuration/05-high-availability/03-client-failover.md
index 3e2cb10af..75f86548e 100644
---
a/docs/05-administrator-guide/02-configuration/05-high-availability/03-client-failover.md
+++
b/docs/05-administrator-guide/02-configuration/05-high-availability/03-client-failover.md
@@ -12,7 +12,7 @@ The failover and retry mechanisms operate transparently to
client applications.
## Client to Ozone Manager Failover
-Clients always submit requests to the leader Ozone Manager (OM). If the
`leader` is `unknown`, clients start by sending requests to the first OM in the
configuration and retries other OMs until a leader is found.
+By default, clients submit requests to the leader Ozone Manager (OM). If the
`leader` is `unknown`, clients start by sending requests to the first OM in the
configuration and retries other OMs until a leader is found. When [OM follower
read](./om-follower-read) is enabled, eligible read-only OM requests can be
sent to followers and fall back to the leader path when needed.
### 1. Hadoop RPC Transport (`HadoopRpcOMFailoverProxyProvider`)
diff --git
a/docs/05-administrator-guide/02-configuration/05-high-availability/05-om-follower-read.md
b/docs/05-administrator-guide/02-configuration/05-high-availability/05-om-follower-read.md
new file mode 100644
index 000000000..51e7fb6c9
--- /dev/null
+++
b/docs/05-administrator-guide/02-configuration/05-high-availability/05-om-follower-read.md
@@ -0,0 +1,185 @@
+---
+sidebar_label: OM Follower Read
+---
+
+# OM Follower Read
+
+OM follower read lets Ozone clients send eligible read-only OM requests to
non-leader OMs in an HA service. This can reduce read load on the OM leader
while preserving the normal leader path for writes and for read requests that
are not eligible for follower routing.
+
+Follower read is disabled by default. When it is enabled on the client, the
Hadoop RPC transport wraps the normal OM failover proxy with a follower-read
proxy. The proxy adds a read consistency hint to eligible read requests, tries
OM nodes in the configured service, and falls back to the leader path if
follower reads cannot be served.
+
+Client follower read is currently implemented only for the Hadoop RPC OM
transport. If a client is configured to use the gRPC OM transport,
`ozone.client.follower.read.enabled` does not enable follower-read routing for
that client.
+
+## Enable follower reads
+
+Use the Hadoop RPC transport for the client-to-OM channel. This is the default
transport, but it can be set explicitly in the client configuration used by
Ozone clients, S3 Gateway, or other services that issue OM reads:
+
+```xml
+<property>
+ <name>ozone.om.transport.class</name>
+
<value>org.apache.hadoop.ozone.om.protocolPB.Hadoop3OmTransportFactory</value>
+</property>
+```
+
+Then set the client-side follower-read flag:
+
+```xml
+<property>
+ <name>ozone.client.follower.read.enabled</name>
+ <value>true</value>
+</property>
+```
+
+Do not set `ozone.om.transport.class` to
`org.apache.hadoop.ozone.om.protocolPB.GrpcOmTransportFactory` for clients that
should use OM follower read.
+
+The default follower-read consistency is `LINEARIZABLE_ALLOW_FOLLOWER`, which
preserves linearizable read semantics when the OM Ratis server supports
linearizable reads.
+
+```xml
+<property>
+ <name>ozone.client.follower.read.default.consistency</name>
+ <value>LINEARIZABLE_ALLOW_FOLLOWER</value>
+</property>
+```
+
+To serve `LINEARIZABLE_ALLOW_FOLLOWER` reads from followers, enable
linearizable reads on OM Ratis servers:
+
+```xml
+<property>
+ <name>ozone.om.ha.raft.server.read.option</name>
+ <value>LINEARIZABLE</value>
+</property>
+```
+
+If `ozone.om.ha.raft.server.read.option` remains `DEFAULT`, followers cannot
serve linearizable follower reads. The request falls back to the leader path.
+
+## Follower read consistency
+
+`ozone.client.follower.read.default.consistency` controls the consistency hint
used when a client sends an eligible read request to a follower.
+
+| Consistency | Behavior |
+|-------------|----------|
+| `LINEARIZABLE_ALLOW_FOLLOWER` | The client may send a read to a follower.
The OM uses the Ratis ReadIndex path so the follower does not return stale
data. |
+| `LOCAL_LEASE` | The follower may serve local metadata directly when its
leader RPC time and log-index lag are within configured bounds. This is a
bounded-staleness mode. |
+
+The follower-read default must be a consistency mode that allows follower
reads. Valid values for `ozone.client.follower.read.default.consistency` are
`LINEARIZABLE_ALLOW_FOLLOWER` and `LOCAL_LEASE`. Setting it to `DEFAULT` or
`LINEARIZABLE_LEADER_ONLY` is invalid.
+
+## Leader read consistency
+
+`ozone.client.leader.read.default.consistency` controls the consistency hint
used when a client sends a read request through the leader path. It can be
changed separately from follower-read consistency:
+
+```xml
+<property>
+ <name>ozone.client.leader.read.default.consistency</name>
+ <value>DEFAULT</value>
+</property>
+```
+
+| Consistency | Behavior |
+|-------------|----------|
+| `DEFAULT` | The leader serves reads using the default leader path for
backward compatibility. |
+| `LINEARIZABLE_LEADER_ONLY` | The leader serves the read only after a
linearizable leadership check. |
+
+The leader-read default must be a consistency mode that does not allow
follower reads. Valid values for `ozone.client.leader.read.default.consistency`
are `DEFAULT` and `LINEARIZABLE_LEADER_ONLY`. Setting it to
`LINEARIZABLE_ALLOW_FOLLOWER` or `LOCAL_LEASE` is invalid.
+
+## Raft linearizability and performance
+
+Linearizable reads make an OM read observe the latest committed Raft state.
With follower reads, this matters because a follower can be behind the leader
even after the leader has committed newer metadata changes.
+
+When OM Ratis uses the `LINEARIZABLE` read option, a read goes through the
Ratis linearizable read path. Ratis verifies that the serving OM is still
consistent with the current leader and that the read is ordered after the
committed log entries it must observe.
+
+For followers, this normally means using the ReadIndex protocol before serving
the local OM metadata. The follower can still serve the response, but it first
coordinates with Raft so the read is not satisfied from stale local state.
+
+This has a performance cost. A linearizable follower read avoids loading every
read on the OM leader's RPC handler and RocksDB path, but it can add Raft
coordination latency compared with a local read. Under read-heavy load, it can
improve leader CPU and request concurrency while increasing per-read latency
relative to `DEFAULT` leader reads.
+
+`DEFAULT` and `LINEARIZABLE_LEADER_ONLY` both route reads to the OM leader,
but they make different consistency and latency trade-offs.
+
+With `DEFAULT`, the ready leader serves the read directly from local committed
metadata. This is the fastest leader-read path because it does not perform a
Raft linearizable read check for every request.
+
+With `LINEARIZABLE_LEADER_ONLY`, the leader first verifies that it is still
the current leader before serving the read. This prevents stale reads from an
old leader, but the extra Raft coordination can add latency.
+
+The split-brain case is the reason for the distinction. During a network
partition, an old leader might be separated from the majority while a new
leader is elected. Until the old leader detects the loss of leadership, a
direct `DEFAULT` read from that old leader can return stale data because it
does not have transactions committed by the new leader.
+
+In normal operation, that split-brain window is expected to be rare. Ozone
therefore keeps `DEFAULT` as the leader-read default for backward compatibility
and performance, because the latency benefit is usually more valuable than
strict linearizability for every leader read.
+
+If strict linearizability is required, use `LINEARIZABLE_LEADER_ONLY` for
leader reads, or `LINEARIZABLE_ALLOW_FOLLOWER` when followers should also be
allowed to serve reads. If `ozone.om.allow.leader.skip.linearizable.read` is
enabled, a ready leader can skip the linearizable read check and serve local
committed data directly, which gives the same trade-off as the default leader
path.
+
+`LOCAL_LEASE` removes the ReadIndex round trip when the follower is within
configured lag bounds. This gives the lowest follower-read latency, but it is a
bounded-staleness mode rather than a linearizable mode.
+
+## Network partitions and ReadIndex timeout
+
+Linearizable follower reads depend on the follower being able to complete the
Ratis ReadIndex protocol. If a follower is partitioned away from the majority,
it cannot confirm a current read index with the leader. The read waits until
the Ratis read timeout expires, then fails and the client can retry or fail
over to another OM.
+
+In Ozone, OM HA Ratis server properties can be configured by prefixing the
Ratis property with `ozone.om.ha.`. For example, Ratis
`raft.server.read.timeout` becomes `ozone.om.ha.raft.server.read.timeout`.
+
+The main Ratis timeout setting for follower-read behavior is:
+
+| Property | Ratis default | Description |
+|----------|---------------|-------------|
+| `ozone.om.ha.raft.server.read.timeout` | `10s` | Timeout for linearizable
read-only requests, including the wait for the follower to complete the read
against a current Raft state. |
+
+## Ratis leader lease
+
+Ratis leader lease can reduce the cost of linearizable leader reads. When the
leader has a valid lease based on recent majority heartbeats, it can serve a
linearizable leader read without sending a fresh heartbeat round for that read.
+
+This optimization applies to leader reads, such as `LINEARIZABLE_LEADER_ONLY`
and the leader side of `LINEARIZABLE_ALLOW_FOLLOWER`. It does not make a
partitioned follower able to serve a linearizable follower read.
+
+The trade-off is failover timing. The lease is derived from the minimum
election timeout and `raft.server.read.leader.lease.timeout.ratio`. A new
leader should not be elected until the old leader's lease can no longer be
valid, so enabling or lengthening the lease can increase the time before a
partitioned leader is replaced.
+
+Leader lease also assumes bounded clock drift across OM hosts. If clocks can
drift beyond the lease assumptions, the linearizability guarantee can be
weakened because different OMs may disagree about whether the old leader's
lease is still valid.
+
+Leader lease can be configured with:
+
+| Property | Ratis default | Description |
+|----------|---------------|-------------|
+| `ozone.om.ha.raft.server.read.leader.lease.enabled` | `false` in Ozone |
Enables Ratis leader lease for linearizable leader reads. |
+| `ozone.om.ha.raft.server.read.leader.lease.timeout.ratio` | `0.9` | Controls
the maximum timeout ratio used by the Ratis leader lease. |
+
+## Local lease reads
+
+`LOCAL_LEASE` trades strict linearizability for lower latency by allowing a
follower to serve local data without running ReadIndex, but only while the
follower is close enough to the leader.
+
+Enable local lease reads on OMs:
+
+```xml
+<property>
+ <name>ozone.om.follower.read.local.lease.enabled</name>
+ <value>true</value>
+</property>
+```
+
+Then configure clients to request `LOCAL_LEASE` for follower reads:
+
+```xml
+<property>
+ <name>ozone.client.follower.read.default.consistency</name>
+ <value>LOCAL_LEASE</value>
+</property>
+```
+
+The OM accepts a local lease read only when both configured limits pass:
+
+| Property | Default | Description |
+|----------|---------|-------------|
+| `ozone.om.follower.read.local.lease.log.limit` | `10000` | Maximum allowed
gap between the follower's last applied index and the leader's commit index.
Set to `-1` to allow unbounded log lag. |
+| `ozone.om.follower.read.local.lease.time.ms` | `5000` | Maximum elapsed time
since the follower last heard from the leader. Set to `-1` to allow unbounded
time lag. |
+
+:::caution
+Use `LOCAL_LEASE` only for workloads that can tolerate bounded stale metadata.
It does not guarantee monotonic reads across client failover from a newer OM to
an older-but-still-within-bounds follower.
+:::
+
+## Request routing and fallback
+
+Only read-only OM request types are eligible for follower read routing. Writes
and ineligible reads continue to use the leader failover path.
+
+When follower read is enabled, the client:
+
+1. Adds the configured follower-read consistency hint to eligible read
requests that do not already carry a hint.
+2. Tries OM proxies from the HA service configuration.
+3. Uses follower reads when the target OM supports the requested consistency.
+4. Falls back to the leader failover path if follower reads are disabled,
unsupported, exhausted, or the request is not eligible.
+
+If a follower returns `OMNotLeaderException` for a follower-read request, the
client treats that as a signal that follower read is unsupported or
misconfigured, disables follower read for that proxy provider, and retries
through the leader path.
+
+## Related features
+
+Follower read works with both voting follower OMs and Listener OMs. Listener
OMs are read-only, non-voting members that can be used to add more read-serving
OM replicas. See [Listener OM](./listener-om) for listener-specific setup and
limitations.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]