[
https://issues.apache.org/jira/browse/RATIS-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tsz-wo Sze reassigned RATIS-2129:
---------------------------------
Assignee: Janus Chow (was: Tsz-wo Sze)
> Low replication performance because of lock contention on RaftLog
> -----------------------------------------------------------------
>
> Key: RATIS-2129
> URL: https://issues.apache.org/jira/browse/RATIS-2129
> Project: Ratis
> Issue Type: Improvement
> Components: server
> Affects Versions: 3.1.0
> Reporter: Duong
> Assignee: Janus Chow
> Priority: Major
> Labels: performance
> Attachments: Screenshot 2024-07-22 at 4.40.07 PM-1.png, Screenshot
> 2024-07-22 at 4.40.07 PM.png, dn_echo_leader_profile.html,
> image-2024-07-22-15-25-46-155.png, ratis_ratfLog_lock_contention.png
>
> Time Spent: 5h 40m
> Remaining Estimate: 0h
>
> Today, the GrpcLogAppender thread makes a lot of calls that need RaftLog's
> readLock. In an active environment, RaftLog is always busy appending
> transactions from clients, thus writeLock is frequently busy. This makes the
> replication performance slow.
> See the [^dn_echo_leader_profile.html], or in the picture below, the purple
> is the time taken to acquire readLock from RaftLog.
> # !image-2024-07-22-15-25-46-155.png|width=854,height=425!
> h2. A summary of LockContention in Ratis.
> h2.
> !ratis_ratfLog_lock_contention.png|width=392,height=380!
> Today, RaftLog consistency is protected by a global ReadWriteLock. (global
> means RaftLog has a single ReadWriteLock and the lock is acquired at the
> scope of the RaftLog instance, or a RaftGroup).
> In a RaftGroup, the following actors race to obtain this global ReadWriteLock
> in the leader node:
> * The writer, which is the GRPC Client Service, accepts transaction
> submissions from Raft clients and appends transactions (or log entries) to
> RaftLog. Each append operation needs to acquire the writeLock from RaftLog to
> put the transaction to RaftLog's memory queue. Although each of these append
> operations is quick, Ratis is designed to maximize transactions append and so
> the writeLock should be always busy.
> * StateMachineUpdater. For each transaction, when it is acknowledged by
> enough followers, this single thread actor will read the log from RaftLog and
> call StateMachine to apply the transaction. This actor acquires readLock from
> RaftLog for each log entry read.
> * GrpcLogAppender: for each follower, there's a thread of GrpcLogAppender
> that constantly reads log entries from RaftLog and replicates them to the
> follower. This thread acquires readLock from RaftLog every time it reads a
> log entry.
> All writer, StateMachineUpdater, and GrpcLogAppender are all designed in a
> way to maximize their throughput. For instance, StateMachineUpdater invokes
> StateMachine's applyTransaction as asynchronous calls. The same is the way
> GrpcLogAppender replicates log entries to the follower.
> The global ReadWriteLock *creates a tough contention* between the RaftLog
> writers and readers. And that's what limit the ratis throughput down. The
> faster the writers and readers are, the more they block each other.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)