[ 
https://issues.apache.org/jira/browse/RATIS-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18079322#comment-18079322
 ] 

Ivan Andika edited comment on RATIS-2403 at 5/8/26 5:45 AM:
------------------------------------------------------------

Observation

* For read dominated workload, read throughput improved, but write suffers
** This is slightly better than before, since before this improvements, read 
throughput suffer even in read dominated workload
** However, we recently fixed a large bottleneck introduced by Hadoop metrics 
that improved the leader read performance 5x (that's why we can reach around 
230K QPS). After this improvement however, the leader batch write read 
throughput improvements disappear (so both writes and reads throughput 
decreased), although the read throughput decrease is not as bad if we don't use 
leader batch write.
*For write dominated workload, write throughput improved, but read suffers
* Reducing the batch interval from 10ms to 5ms causing read throughput to 
suffer overall

Notable info

* The benchmark only tests for OM metadata only throughput (since it only 
handle 0-sized key). In workloads with key with actual data, the metadata 
latency increase might be amortized by the data write and read latency.
* The benchmark only tests for 100 clients, with higher clients, the read 
throughput increase might be more noticeable
* The benchmark tests for mixed read and write clients, for read only clients, 
the read throughput increase might be better. Conversely, for write only 
clients, the write throughput might be worse.
* The benchmark only operates on a single bucket so the write throughput and 
read throughput might be bottlenecked by the bucket lock. Multiple buckets 
might help.


was (Author: JIRAUSER298977):
Observation

* For read dominated workload, read throughput improved, but write suffers
** This is slightly better than before, since before this improvements, read 
throughput suffer even in read dominated workload
** However, we recently fixed a large bottleneck introduced by Hadoop metrics 
that improved the leader read performance 5x (that's why we can reach around 
230K QPS). After this improvement however, the leader batch write read 
throughput improvements disappear (so both writes and reads throughput 
decreased), although the read throughput decrease is not as bad if we don't use 
leader batch write.
*For write dominated workload, write throughput improved, but read suffers
* Reducing the batch interval from 10ms to 5ms causing read throughput to 
suffer overall

Notable info

*The benchmark only tests for OM metadata only throughput (since it only handle 
0-sized key). In workloads with key with actual data, the metadata latency 
increase might be amortized by the data write and read latency.
* The benchmark only tests for 100 clients, with higher clients, the read 
throughput increase might be more noticeable
*The benchmark tests for mixed read and write clients, for read only clients, 
the read throughput increase might be better. Conversely, for write only 
clients, the write throughput might be worse.
*The benchmark only operates on a single bucket so the write throughput and 
read throughput might be bottlenecked by the bucket lock. Multiple buckets 
might help.

> Improve linearizable follower read throughput instead of writes
> ---------------------------------------------------------------
>
>                 Key: RATIS-2403
>                 URL: https://issues.apache.org/jira/browse/RATIS-2403
>             Project: Ratis
>          Issue Type: Improvement
>          Components: Linearizable Read
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>             Fix For: 3.3.0
>
>         Attachments: 1362_review.patch, 1362_review2.patch, 
> LAW_THEOREM_RATIS_ANALYSIS.md, leader-backpressure.patch, 
> leader-batch-write.patch
>
>          Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> While benchmarking linearizable follower read, the observation is that the 
> more requests go to the followers instead of the leader, the better write 
> throughput becomes, we saw around 2-3x write throughput increase compared to 
> the leader-only write and read (most likely due to less leader resource 
> contention). However, the read throughput becomes worst than leader-only 
> write and read  (some can be below 0.2x). Even with optimizations such as 
> RATIS-2392 RATIS-2382 [https://github.com/apache/ratis/pull/1334] RATIS-2379, 
> the read throughput remains worse than leader-only write (it even improves 
> the write performance instead of the read performance).
> I suspect that because write throughput increase, the read index increases at 
> a faster rate which causes follower linearizable read to wait longer.
> The target is to improve read throughput by 1.5x - 2x of the leader-only 
> write and reads. Currently pure reads (no writes) performance improves read 
> throughput up to 1.7x, but total follower read throughput is way below this 
> target.
> Currently my ideas are
>  * Sacrificing writes for reads: Can we limit the write QPS so that read QPS 
> can increase
>  ** From the benchmark, the read throughput only improves when write 
> throughput is lower
>  ** We can try to use backpressure mechanism so that writes do not advance so 
> quickly that read throughput suffer
>  *** Follower gap mechanisms (RATIS-1411), but this might cause leader to 
> stall if follower down for a while (e.g. restarted), which violates the 
> majority availability guarantee. It's also hard to know which value is 
> optimal for different workloads.
> Raising this ticket for ideas. [~szetszwo] [~tanxinyu] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to