[jira] [Updated] (RATIS-2392) Leader should trigger heartbeat immediately after ReadIndex

Ivan Andika (Jira) Wed, 04 Feb 2026 01:08:04 -0800


     [ 
https://issues.apache.org/jira/browse/RATIS-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ivan Andika updated RATIS-2392:
-------------------------------
    Description: 
This issue is found when debugging slow {{TestOzoneShellHAWithFollowerRead}} 
(it was running as long as 10mins, although {{TestOzoneShellHA}} only runs for 
2 minutes). It's observed that 
{{OzoneManagerProtocolServerSideTranslatorPB#submitReadRequestToOM}} latency is 
around 500ms (which is unacceptably long, exceeding disk latency) for some read 
requests. This rules out high ReadIndex network latency since the test is run 
locally.

After long investigation and debugging, the main latency is in the follower's 
{{{}ReadRequests#waitForAdvance{}}}. However, the main follower bottleneck is 
in {{StateMachineUpdater#waitForCommit}} instead of the previous hypotheses of 
1) slow follower {{StateMachine#applyTransactions}} 2) the {{ReadIndex}} 
network communication 3) leader's {{ReadIndex}} latency (which should already 
be solved by RATIS-2379 and RATIS-2382.

>From the debug logs, the root cause is that the follower has not seen the 
>latest leader's commitIndex (e.g. leader's commitIndex is 10, but follower's 
>commitIndex is 9) and therefore the follower cannot increase its commitIndex 
>and apply transactions up to the higher commitIndex (see the 
>{{{}StateMachineUpdater#waitForCommit{}}}). Therefore, follower is stuck 
>waiting in {{StateMachineUpdater#waitForCommit}} until the follower receives 
>an AppendEntries from the leader with the leaderCommit >= readIndex. The 
>leader's commitIndex is only included in the {{{}AppendEntries{}}}.

One solution is to trigger heartbeat / AppendEntries to the follower 
immediately after ReadIndex is returned. Previously I was also thinking to 
allow {{AppendEntriesRequestProto}} to be added to the {{ReadIndexReplyProto}} 
to save the number of RPC calls, but this can cause subtle bugs and further 
latency increase (follower needs to process and reply AppendEntries, if not the 
leader will need to keep sending the AppendEntries).

After the improvement, the test goes down from 10 minutes to 2 minutes (similar 
with {{{}TestOzoneShellHA{}}}). However, when I benchmarked the performance, 
there are no significant improvements. I suspect the performance improvement is 
largest if there the Ratis group is not busy (i.e. there are not a lot of 
AppendEntries) since otherwise one of these AppendEntries will help to carry 
the leaderCommit. 

  was:
This issue is found when debugging slow {{TestOzoneShellHAWithFollowerRead}} 
(it was running as long as 10mins, although {{TestOzoneShellHA}} only runs for 
2 minutes). It's observed that 
{{OzoneManagerProtocolServerSideTranslatorPB#submitReadRequestToOM}} latency is 
around 500ms (which is unacceptably long, exceeding disk latency) for some read 
requests. This rules out high ReadIndex network latency since the test is run 
locally.

After long investigation and debugging, the main latency is in the follower's 
{{{}ReadRequests#waitForAdvance{}}}. However, the main follower bottleneck is 
in {{StateMachineUpdater#waitForCommit}} instead of the previous hypotheses of 
1) slow follower {{StateMachine#applyTransactions}} 2) the {{ReadIndex}} 
network communication 3) leader's {{ReadIndex}} latency (which should already 
be solved by RATIS-2379 and RATIS-2382.

>From the debug logs, the root cause is that the follower has not seen the 
>latest leader's commitIndex (e.g. leader's commitIndex is 10, but follower's 
>commitIndex is 9) and therefore the follower cannot increase its commitIndex 
>and apply transactions up to the higher commitIndex (see the 
>{{{}StateMachineUpdater#waitForCommit{}}}). Therefore, follower is stuck 
>waiting in {{StateMachineUpdater#waitForCommit}} until the follower receives 
>an AppendEntries from the leader with the leaderCommit >= readIndex. The 
>leader's commitIndex is only included in the {{{}AppendEntries{}}}.

One solution is to trigger heartbeat / AppendEntries to the follower 
immediately after ReadIndex is returned. Previously I was also thinking to 
allow {{AppendEntriesRequestProto}} to be added to the {{ReadIndexReplyProto}} 
to save the number of RPC calls, but this can cause subtle bugs and further 
latency increase (follower needs to process and reply AppendEntries, if not the 
leader will need to keep sending the AppendEntries).

After the improvement, the test goes down from 10 minutes to 2 minutes (similar 
with {{{}TestOzoneShellHA{}}}). However, I suspect the performance improvement 
is largest if there the Ratis group is not busy (i.e. there are not a lot of 
AppendEntries) since otherwise one of these AppendEntries will help to carry 
the leaderCommit. 


> Leader should trigger heartbeat immediately after ReadIndex
> -----------------------------------------------------------
>
>                 Key: RATIS-2392
>                 URL: https://issues.apache.org/jira/browse/RATIS-2392
>             Project: Ratis
>          Issue Type: Improvement
>          Components: Linearizable Read, performance
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>         Attachments: image-2026-02-04-17-01-22-314.png, 
> image-2026-02-04-17-01-50-676.png, image-2026-02-04-17-02-15-168.png
>
>
> This issue is found when debugging slow {{TestOzoneShellHAWithFollowerRead}} 
> (it was running as long as 10mins, although {{TestOzoneShellHA}} only runs 
> for 2 minutes). It's observed that 
> {{OzoneManagerProtocolServerSideTranslatorPB#submitReadRequestToOM}} latency 
> is around 500ms (which is unacceptably long, exceeding disk latency) for some 
> read requests. This rules out high ReadIndex network latency since the test 
> is run locally.
> After long investigation and debugging, the main latency is in the follower's 
> {{{}ReadRequests#waitForAdvance{}}}. However, the main follower bottleneck is 
> in {{StateMachineUpdater#waitForCommit}} instead of the previous hypotheses 
> of 1) slow follower {{StateMachine#applyTransactions}} 2) the {{ReadIndex}} 
> network communication 3) leader's {{ReadIndex}} latency (which should already 
> be solved by RATIS-2379 and RATIS-2382.
> From the debug logs, the root cause is that the follower has not seen the 
> latest leader's commitIndex (e.g. leader's commitIndex is 10, but follower's 
> commitIndex is 9) and therefore the follower cannot increase its commitIndex 
> and apply transactions up to the higher commitIndex (see the 
> {{{}StateMachineUpdater#waitForCommit{}}}). Therefore, follower is stuck 
> waiting in {{StateMachineUpdater#waitForCommit}} until the follower receives 
> an AppendEntries from the leader with the leaderCommit >= readIndex. The 
> leader's commitIndex is only included in the {{{}AppendEntries{}}}.
> One solution is to trigger heartbeat / AppendEntries to the follower 
> immediately after ReadIndex is returned. Previously I was also thinking to 
> allow {{AppendEntriesRequestProto}} to be added to the 
> {{ReadIndexReplyProto}} to save the number of RPC calls, but this can cause 
> subtle bugs and further latency increase (follower needs to process and reply 
> AppendEntries, if not the leader will need to keep sending the AppendEntries).
> After the improvement, the test goes down from 10 minutes to 2 minutes 
> (similar with {{{}TestOzoneShellHA{}}}). However, when I benchmarked the 
> performance, there are no significant improvements. I suspect the performance 
> improvement is largest if there the Ratis group is not busy (i.e. there are 
> not a lot of AppendEntries) since otherwise one of these AppendEntries will 
> help to carry the leaderCommit. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (RATIS-2392) Leader should trigger heartbeat immediately after ReadIndex

Reply via email to