Ivan Andika created RATIS-2392:
----------------------------------

             Summary: Leader should trigger heartbeat immediately after 
ReadIndex
                 Key: RATIS-2392
                 URL: https://issues.apache.org/jira/browse/RATIS-2392
             Project: Ratis
          Issue Type: Improvement
          Components: Linearizable Read, performance
            Reporter: Ivan Andika
            Assignee: Ivan Andika


This issue is found when debugging slow {{TestOzoneShellHAWithFollowerRead}} 
(it was running as long as 10mins, although {{TestOzoneShellHA}} only runs for 
2 minutes). It's observed that 
{{OzoneManagerProtocolServerSideTranslatorPB#submitReadRequestToOM}} latency is 
around 500ms (which is unacceptably long, exceeding disk latency) for some read 
requests. This rules out high ReadIndex network latency since the test is run 
locally.

After long investigation and debugging, the main latency is in the follower's 
{{{}ReadRequests#waitForAdvance{}}}. However, the main follower bottleneck is 
in {{StateMachineUpdater#waitForCommit}} instead of the previous hypotheses of 
1) slow follower {{StateMachine#applyTransactions}} 2) the {{ReadIndex}} 
network communication 3) leader's {{ReadIndex}} latency (which should already 
be solved by RATIS-2379 and RATIS-2382.

>From the debug logs, the root cause is that the follower has not seen the 
>latest leader's commitIndex (e.g. leader's commitIndex is 10, but follower's 
>commitIndex is 9) and therefore the follower cannot increase its commitIndex 
>and apply transactions up to the higher commitIndex (see the 
>{{{}StateMachineUpdater#waitForCommit{}}}). Therefore, follower is stuck 
>waiting in {{StateMachineUpdater#waitForCommit}} until the follower receives 
>an AppendEntries from the leader with the leaderCommit >= readIndex. The 
>leader's commitIndex is only included in the {{{}AppendEntries{}}}.

One solution is to trigger heartbeat / AppendEntries to the follower 
immediately after ReadIndex is returned. Previously I was also thinking to 
allow {{AppendEntriesRequestProto}} to be added to the {{ReadIndexReplyProto}} 
to save the number of RPC calls, but this can cause subtle bugs and further 
latency increase (follower needs to process and reply AppendEntries, if not the 
leader will need to keep sending the AppendEntries).

After the improvement, the test goes down from 10 minutes to 2 minutes (similar 
with {{{}TestOzoneShellHA{}}}). However, I suspect the performance improvement 
is largest if there the Ratis group is not busy (i.e. there are not a lot of 
AppendEntries) since otherwise one of these AppendEntries will help to carry 
the leaderCommit. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to