[
https://issues.apache.org/jira/browse/RATIS-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duong updated RATIS-2129:
-------------------------
Description:
Today, the GrpcLogAppender thread makes a lot of calls that need RaftLog's
readLock. In an active environment, RaftLog is always busy appending
transactions from clients, thus writeLock is frequently busy. This makes the
replication performance slow.
See the [^dn_echo_leader_profile.html], or in the picture below, the purple is
the time taken to acquire readLock from RaftLog.
# !image-2024-07-22-15-25-46-155.png|width=854,height=425!
h2. A summary of LockContention in Ratis.
h2.
!ratis_ratfLog_lock_contention.png|width=392,height=380!
Today, RaftLog consistency is protected by a global ReadWriteLock. (global
means RaftLog has a single ReadWriteLock and the lock is acquired at the scope
of the RaftLog instance, or a RaftGroup).
In a RaftGroup, the following actors race to obtain this global ReadWriteLock
in the leader node:
* The writer, which is the GRPC Client Service, accepts transaction
submissions from Raft clients and appends transactions (or log entries) to
RaftLog. Each append operation needs to acquire the writeLock from RaftLog to
put the transaction to RaftLog's memory queue. Although each of these append
operations is quick, Ratis is designed to maximize transactions append and so
the writeLock should be always busy.
* StateMachineUpdater. For each transaction, when it is acknowledged by enough
followers, this single thread actor will read the log from RaftLog and call
StateMachine to apply the transaction. This actor acquires readLock from
RaftLog for each log entry read.
* GrpcLogAppender: for each follower, there's a thread of GrpcLogAppender that
constantly reads log entries from RaftLog and replicates them to the follower.
This thread acquires readLock from RaftLog every time it reads a log entry.
All writer, StateMachineUpdater, and GrpcLogAppender are all designed in a way
to maximize their throughput. For instance, StateMachineUpdater invokes
StateMachine's applyTransaction as asynchronous calls. The same is the way
GrpcLogAppender replicates log entries to the follower.
The global ReadWriteLock *creates a tough contention* between the RaftLog
writers and readers. And that's what limit the ratis throughput down. The
faster the writers and readers are, the more they block each other.
was:
Today, the GrpcLogAppender thread makes a lot of calls that need RaftLog's
readLock. In an active environment, RaftLog is always busy appending
transactions from clients, thus writeLock is frequently busy. This makes the
replication performance slow.
See the [^dn_echo_leader_profile.html], or in the picture below, the purple is
the time taken to acquire readLock from RaftLog.
# !image-2024-07-22-15-25-46-155.png|width=854,height=425!
h2. A summary of LockContention in Ratis.
h2.
!ratis_ratfLog_lock_contention.png|width=392,height=380!
Today, RaftLog consistency is protected by a global ReadWriteLock. (global
means RaftLog has a single ReadWriteLock and the lock is acquired at the scope
of the RaftLog instance, or a RaftGroup).
In a RaftGroup, the following actors content this goal ReadWriteLock in the
leader node:
* The writer, which is the GRPC Client Service, accepts transaction
submissions from Raft clients and appends transactions (or log entries) to
RaftLog. Each append operation needs to acquire the writeLock from RaftLog to
put the transaction to RaftLog's memory queue. Although each of these append
operations is quick, Ratis is designed to maximize transactions append and so
the writeLock should be always busy.
* StateMachineUpdater. For each transaction, when it is acknowledged by enough
followers, this single thread actor will read the log from RaftLog and call
StateMachine to apply the transaction. This actor acquires readLock from
RaftLog for each log entry read.
* GrpcLogAppender: for each follower, there's a thread of GrpcLogAppender that
constantly reads log entries from RaftLog and replicates them to the follower.
This thread acquires readLock from RaftLog every time it reads a log entry.
All writer, StateMachineUpdater, and GrpcLogAppender are all designed in a way
to maximize their throughput. For instance, StateMachineUpdater invokes
StateMachine's applyTransaction as asynchronous calls. The same is the way
GrpcLogAppender replicates log entries to the follower.
The global ReadWriteLock creates tough contention between the RaftLog writers
and readers. And that slows the ratis throughput down.
> Low replication performance because of lock contention on RaftLog
> -----------------------------------------------------------------
>
> Key: RATIS-2129
> URL: https://issues.apache.org/jira/browse/RATIS-2129
> Project: Ratis
> Issue Type: Bug
> Components: server
> Affects Versions: 3.1.0
> Reporter: Duong
> Assignee: Tsz-wo Sze
> Priority: Blocker
> Labels: Performance, performance
> Attachments: Screenshot 2024-07-22 at 4.40.07 PM-1.png, Screenshot
> 2024-07-22 at 4.40.07 PM.png, dn_echo_leader_profile.html,
> image-2024-07-22-15-25-46-155.png, ratis_ratfLog_lock_contention.png
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Today, the GrpcLogAppender thread makes a lot of calls that need RaftLog's
> readLock. In an active environment, RaftLog is always busy appending
> transactions from clients, thus writeLock is frequently busy. This makes the
> replication performance slow.
> See the [^dn_echo_leader_profile.html], or in the picture below, the purple
> is the time taken to acquire readLock from RaftLog.
> # !image-2024-07-22-15-25-46-155.png|width=854,height=425!
> h2. A summary of LockContention in Ratis.
> h2.
> !ratis_ratfLog_lock_contention.png|width=392,height=380!
> Today, RaftLog consistency is protected by a global ReadWriteLock. (global
> means RaftLog has a single ReadWriteLock and the lock is acquired at the
> scope of the RaftLog instance, or a RaftGroup).
> In a RaftGroup, the following actors race to obtain this global ReadWriteLock
> in the leader node:
> * The writer, which is the GRPC Client Service, accepts transaction
> submissions from Raft clients and appends transactions (or log entries) to
> RaftLog. Each append operation needs to acquire the writeLock from RaftLog to
> put the transaction to RaftLog's memory queue. Although each of these append
> operations is quick, Ratis is designed to maximize transactions append and so
> the writeLock should be always busy.
> * StateMachineUpdater. For each transaction, when it is acknowledged by
> enough followers, this single thread actor will read the log from RaftLog and
> call StateMachine to apply the transaction. This actor acquires readLock from
> RaftLog for each log entry read.
> * GrpcLogAppender: for each follower, there's a thread of GrpcLogAppender
> that constantly reads log entries from RaftLog and replicates them to the
> follower. This thread acquires readLock from RaftLog every time it reads a
> log entry.
> All writer, StateMachineUpdater, and GrpcLogAppender are all designed in a
> way to maximize their throughput. For instance, StateMachineUpdater invokes
> StateMachine's applyTransaction as asynchronous calls. The same is the way
> GrpcLogAppender replicates log entries to the follower.
> The global ReadWriteLock *creates a tough contention* between the RaftLog
> writers and readers. And that's what limit the ratis throughput down. The
> faster the writers and readers are, the more they block each other.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)