[
https://issues.apache.org/jira/browse/RATIS-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18058029#comment-18058029
]
Xinyu Tan edited comment on RATIS-2403 at 2/12/26 5:03 AM:
-----------------------------------------------------------
[~ivanandika]
I believe the core issue of balancing read/write performance needs to be
examined from two distinct perspectives:
# Effectiveness of Follower read: We can define effectiveness through the
following experimental model:
* Baseline: Both read and write loads are low, and the system has no resource
bottlenecks.
* Introducing a Bottleneck: Increase the query load on the Leader by N times
(number of replicas), forcing the Leader into a bottlenecked state. At this
point, write throughput will inevitably drop due to resource contention, and
query throughput will also be capped.
* Enabling Follower Read: Distribute these additional query loads evenly
across all replicas.
* Expected Outcome: If write performance returns to its baseline state (no
longer interfered with by queries) while the total query throughput increases
significantly without hitting physical limits across the group, it proves that
Ratis’s Follower Read is truly effective in offloading the Leader and
decoupling resources.
# Resource Zero-Sum Game under High Pressure:
In extreme stress-test scenarios, the total read/write capacity of the
consensus group is capped by physical resources (CPU, I/O, Network).
A clear example is: even without Follower Read, if we pin all reads and writes
to the Leader until it bottlenecks, any further increase in write load will
inevitably decrease query throughput. This phenomenon persists even with
Follower Read enabled, as faster writes force Followers to consume more
resources for log synchronization and application (Apply), which in turn
encroaches on query resources.
Based on the above analysis, I suggest the following directions:
* Introduce Admission Control (Rate Limiting): Simply optimizing the algorithm
cannot change the total resource limit. To fundamentally address the mutual
interference between reads and writes, we might need a rate-limiting mechanism
for writes. This would allow us to explicitly define a resource ceiling for
writes within this trade-off, leaving guaranteed headroom for query throughput.
* Enhance Observability and Identify Optimization Opportunities:I recommend
analyzing disk IOPS, network bandwidth, and CPU flame graphs during future
stress tests. Quantitative data—such as WAL write latency, gRPC serialization
overhead, or state machine lock contention—is critical for pinpointing
bottlenecks. For instance, in a previous optimization where I simply batched
the put operations for the WAL blocking queue, I managed to save nearly 20% of
CPU usage for Apache IoTDB. I believe that under current stress-test scenarios,
we can uncover many more similar optimization opportunities in Ratis by using
profiling tools.
was (Author: tanxinyu):
[~ivanandika]
I believe the core issue of balancing read/write performance needs to be
examined from two distinct perspectives:
# Effectiveness of Follower read:
We can define effectiveness through the following experimental model:
** Baseline: Both read and write loads are low, and the system has no resource
bottlenecks.
** Introducing a Bottleneck: Increase the query load on the Leader by N times
(number of replicas), forcing the Leader into a bottlenecked state. At this
point, write throughput will inevitably drop due to resource contention, and
query throughput will also be capped.
** Enabling Follower Read: Distribute these additional query loads evenly
across all replicas.
** Expected Outcome: If write performance returns to its baseline state (no
longer interfered with by queries) while the total query throughput increases
significantly without hitting physical limits across the group, it proves that
Ratis’s Follower Read is truly effective in offloading the Leader and
decoupling resources.
# Resource Zero-Sum Game under High Pressure:
In extreme stress-test scenarios, the total read/write capacity of the
consensus group is capped by physical resources (CPU, I/O, Network).
A clear example is: even without Follower Read, if we pin all reads and writes
to the Leader until it bottlenecks, any further increase in write load will
inevitably decrease query throughput. This phenomenon persists even with
Follower Read enabled, as faster writes force Followers to consume more
resources for log synchronization and application (Apply), which in turn
encroaches on query resources.
Based on the above analysis, I suggest the following directions:
* Introduce Admission Control (Rate Limiting): Simply optimizing the algorithm
cannot change the total resource limit. To fundamentally address the mutual
interference between reads and writes, we might need a rate-limiting mechanism
for writes. This would allow us to explicitly define a resource ceiling for
writes within this trade-off, leaving guaranteed headroom for query throughput.
* Enhance Observability and Identify Optimization Opportunities:I recommend
analyzing disk IOPS, network bandwidth, and CPU flame graphs during future
stress tests. Quantitative data—such as WAL write latency, gRPC serialization
overhead, or state machine lock contention—is critical for pinpointing
bottlenecks. For instance, in a previous optimization where I simply batched
the put operations for the WAL blocking queue, I managed to save nearly 20% of
CPU usage for Apache IoTDB. I believe that under current stress-test scenarios,
we can uncover many more similar optimization opportunities in Ratis by using
profiling tools.
> Improve linearizable follower read throughput instead of writes
> ---------------------------------------------------------------
>
> Key: RATIS-2403
> URL: https://issues.apache.org/jira/browse/RATIS-2403
> Project: Ratis
> Issue Type: Improvement
> Reporter: Ivan Andika
> Priority: Major
> Attachments: leader-backpressure.patch
>
>
> While benchmarking linearizable follower read, the observation is that the
> more requests go to the followers instead of the leader, the better write
> throughput becomes, we saw around 2-3x write throughput increase compared to
> the leader-only write and read (most likely due to less leader resource
> contention). However, the read throughput becomes worst than leader-only
> write and read (some can be below 0.2x). Even with optimizations such as
> RATIS-2392 RATIS-2382 [https://github.com/apache/ratis/pull/1334] RATIS-2379,
> the read throughput remains worse than leader-only write (it even improves
> the write performance instead of the read performance).
> I suspect that because write throughput increase, the read index increases at
> a faster rate which causes follower linearizable read to wait longer.
> The target is to improve read throughput by 1.5x - 2x of the leader-only
> write and reads. Currently pure reads (no writes) performance improves read
> throughput up to 1.7x, but total follower read throughput is way below this
> target.
> Currently my ideas are
> * Sacrificing writes for reads: Can we limit the write QPS so that read QPS
> can increase
> ** From the benchmark, the read throughput only improves when write
> throughput is lower
> ** We can try to use backpressure mechanism so that writes do not advance so
> quickly that read throughput suffer
> *** Follower gap mechanisms (RATIS-1411), but this might cause leader to
> stall if follower down for a while (e.g. restarted), which violates the
> majority availability guarantee. It's also hard to know which value is
> optimal for different workloads.
> Raising this ticket for ideas. [~szetszwo] [~tanxinyu]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)