[
https://issues.apache.org/jira/browse/RATIS-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18058029#comment-18058029
]
Xinyu Tan commented on RATIS-2403:
----------------------------------
[~ivanandika]
I believe the core issue of balancing read/write performance needs to be
examined from two distinct perspectives:
# Effectiveness of Follower read:
Under a fixed load where the Leader has not yet reached its physical
bottleneck, we need to verify whether enabling Follower Read effectively
balances the query load across all replicas. If we observe a query throughput
increase nearly proportional to the number of replicas while write throughput
remains stable, it proves that the current Follower Read mechanism in Ratis is
effective in offloading the Leader.
# Resource Zero-Sum Game under High Pressure:
In extreme stress-test scenarios, the total read/write capacity of the
consensus group is capped by physical resources (CPU, I/O, Network).
A clear example is: even without Follower Read, if we pin all reads and writes
to the Leader until it bottlenecks, any further increase in write load will
inevitably decrease query throughput. This phenomenon persists even with
Follower Read enabled, as faster writes force Followers to consume more
resources for log synchronization and application (Apply), which in turn
encroaches on query resources.
Based on the above analysis, I suggest the following directions:
* Introduce Admission Control (Rate Limiting): Simply optimizing the algorithm
cannot change the total resource limit. To fundamentally address the mutual
interference between reads and writes, we might need a rate-limiting mechanism
for writes. This would allow us to explicitly define a resource ceiling for
writes within this trade-off, leaving guaranteed headroom for query throughput.
* Enhance Observability and Identify Optimization Opportunities:I recommend
analyzing disk IOPS, network bandwidth, and CPU flame graphs during future
stress tests. Quantitative data—such as WAL write latency, gRPC serialization
overhead, or state machine lock contention—is critical for pinpointing
bottlenecks. For instance, in a previous optimization where I simply batched
the put operations for the WAL blocking queue, I managed to save nearly 20% of
CPU usage for Apache IoTDB. I believe that under current stress-test scenarios,
we can uncover many more similar optimization opportunities in Ratis by using
profiling tools.
> Improve linearizable follower read throughput instead of writes
> ---------------------------------------------------------------
>
> Key: RATIS-2403
> URL: https://issues.apache.org/jira/browse/RATIS-2403
> Project: Ratis
> Issue Type: Improvement
> Reporter: Ivan Andika
> Priority: Major
> Attachments: leader-backpressure.patch
>
>
> While benchmarking linearizable follower read, the observation is that the
> more requests go to the followers instead of the leader, the better write
> throughput becomes, we saw around 2-3x write throughput increase compared to
> the leader-only write and read (most likely due to less leader resource
> contention). However, the read throughput becomes worst than leader-only
> write and read (some can be below 0.2x). Even with optimizations such as
> RATIS-2392 RATIS-2382 [https://github.com/apache/ratis/pull/1334] RATIS-2379,
> the read throughput remains worse than leader-only write (it even improves
> the write performance instead of the read performance).
> I suspect that because write throughput increase, the read index increases at
> a faster rate which causes follower linearizable read to wait longer.
> The target is to improve read throughput by 1.5x - 2x of the leader-only
> write and reads. Currently pure reads (no writes) performance improves read
> throughput up to 1.7x, but total follower read throughput is way below this
> target.
> Currently my ideas are
> * Sacrificing writes for reads: Can we limit the write QPS so that read QPS
> can increase
> ** From the benchmark, the read throughput only improves when write
> throughput is lower
> ** We can try to use backpressure mechanism so that writes do not advance so
> quickly that read throughput suffer
> *** Follower gap mechanisms (RATIS-1411), but this might cause leader to
> stall if follower down for a while (e.g. restarted), which violates the
> majority availability guarantee. It's also hard to know which value is
> optimal for different workloads.
> Raising this ticket for ideas. [~szetszwo] [~tanxinyu]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)