[
https://issues.apache.org/jira/browse/RATIS-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868513#comment-17868513
]
Duong commented on RATIS-2129:
------------------------------
One regression is RATIS-2099. It added more work in appendTransaction, thus
making writeLock held for longer, and thus blocking GrpcLogAppender more when
acquiring readLock.
!Screenshot 2024-07-22 at 4.40.07 PM-1.png|width=1063,height=392!
I did a test to revert it and that improved ratis perf by ~10%.
> Low replication performance low because GrpcLogAppender is often blocked by
> RaftLog's readLock
> ----------------------------------------------------------------------------------------------
>
> Key: RATIS-2129
> URL: https://issues.apache.org/jira/browse/RATIS-2129
> Project: Ratis
> Issue Type: Bug
> Components: server
> Affects Versions: 3.1.0
> Reporter: Duong
> Priority: Major
> Labels: Performance, performance
> Attachments: Screenshot 2024-07-22 at 4.40.07 PM-1.png, Screenshot
> 2024-07-22 at 4.40.07 PM.png, dn_echo_leader_profile.html,
> image-2024-07-22-15-25-46-155.png
>
>
> Today, the GrpcLogAppender thread makes a lot of calls that need RaftLog's
> readLock. In an active environment, RaftLog is always busy appending
> transactions from clients, thus writeLock is frequently busy. This makes the
> replication performance slow.
> See the [^dn_echo_leader_profile.html], or in the picture below, the purple
> is the time taken to acquire readLock from RaftLog.
> # !image-2024-07-22-15-25-46-155.png|width=854,height=425!
> So far, I'm not sure if this is a regression from a recent change in
> 3.1.0/3.0.0, or if it's been always the case.
> A few early considerations:
> # The rate of calling RaftLog per GrpcLogAppender seems to be too high.
> Instead of calling RaftLog multiple, maybe the log appended can call once to
> obtain all the required information?
> # Can RaftLog expose those data without requiring a read lock?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)