[ 
https://issues.apache.org/jira/browse/RATIS-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze reassigned RATIS-2129:
---------------------------------

    Assignee: Janus Chow  (was: Tsz-wo Sze)

> Low replication performance because of lock contention on RaftLog
> -----------------------------------------------------------------
>
>                 Key: RATIS-2129
>                 URL: https://issues.apache.org/jira/browse/RATIS-2129
>             Project: Ratis
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.1.0
>            Reporter: Duong
>            Assignee: Janus Chow
>            Priority: Major
>              Labels: performance
>         Attachments: Screenshot 2024-07-22 at 4.40.07 PM-1.png, Screenshot 
> 2024-07-22 at 4.40.07 PM.png, dn_echo_leader_profile.html, 
> image-2024-07-22-15-25-46-155.png, ratis_ratfLog_lock_contention.png
>
>          Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Today, the GrpcLogAppender thread makes a lot of calls that need RaftLog's 
> readLock. In an active environment, RaftLog is always busy appending 
> transactions from clients, thus writeLock is frequently busy. This makes the 
> replication performance slow. 
> See the [^dn_echo_leader_profile.html], or in the picture below, the purple 
> is the time taken to acquire readLock from RaftLog.
>  # !image-2024-07-22-15-25-46-155.png|width=854,height=425!
> h2. A summary of LockContention in Ratis. 
> h2.  
> !ratis_ratfLog_lock_contention.png|width=392,height=380!
> Today, RaftLog consistency is protected by a global ReadWriteLock. (global 
> means RaftLog has a single ReadWriteLock and the lock is acquired at the 
> scope of the RaftLog instance, or a RaftGroup).
> In a RaftGroup, the following actors race to obtain this global ReadWriteLock 
> in the leader node:
>  * The writer, which is the GRPC Client Service, accepts transaction 
> submissions from Raft clients and appends transactions (or log entries) to 
> RaftLog. Each append operation needs to acquire the writeLock from RaftLog to 
> put the transaction to RaftLog's memory queue. Although each of these append 
> operations is quick, Ratis is designed to maximize transactions append and so 
> the writeLock should be always busy.
>  * StateMachineUpdater. For each transaction, when it is acknowledged by 
> enough followers, this single thread actor will read the log from RaftLog and 
> call StateMachine to apply the transaction. This actor acquires readLock from 
> RaftLog for each log entry read. 
>  * GrpcLogAppender: for each follower, there's a thread of GrpcLogAppender 
> that constantly reads log entries from RaftLog and replicates them to the 
> follower. This thread acquires readLock from RaftLog every time it reads a 
> log entry.
> All writer, StateMachineUpdater, and GrpcLogAppender are all designed in a 
> way to maximize their throughput. For instance,  StateMachineUpdater invokes 
> StateMachine's applyTransaction as asynchronous calls. The same is the way 
> GrpcLogAppender replicates log entries to the follower. 
> The global ReadWriteLock *creates a tough contention* between the RaftLog 
> writers and readers. And that's what limit the ratis throughput down. The 
> faster the writers and readers are, the more they block each other.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to