[jira] [Updated] (RATIS-2156) Notify follower slowness based on the log index

Ivan Andika (Jira) Fri, 13 Sep 2024 03:56:07 -0700


     [ 
https://issues.apache.org/jira/browse/RATIS-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ivan Andika updated RATIS-2156:
-------------------------------
    Description: 
Currently the StateMachine.LeaderEventApi#notifyFollowerSlowness is based on 
raft.server.rpc.slowness.timeout, we saw that sometimes there are some cases 
where the rpc rtt between the leader and follower does not exceed the timeout, 
the difference of the log index between the leader and follower keeps 
increasing, i.e. the slow follower cannot catch up.

In Ozone, this causes most watch request with ALL_COMMITTED replication to 
timeout, causing increased latency of writes. It is better to close the 
pipeline if the slow follower cannot catch up.

!image-2024-09-13-18-54-04-203.png|width=1408,height=244!

  was:
Currently the StateMachine.LeaderEventApi#notifyFollowerSlowness is based on 
raft.server.rpc.slowness.timeout, we saw that sometimes there are some cases 
where the rpc rtt between the leader and follower does not exceed the timeout, 
the difference of the log index between the leader and follower keeps 
increasing, i.e. the slow follower cannot catch up.

In Ozone, this causes most watch request with ALL_COMMITTED replication to 
timeout, causing increased latency of writes. It is better to close the 
pipeline if the slow follower cannot catch up.


> Notify follower slowness based on the log index
> -----------------------------------------------
>
>                 Key: RATIS-2156
>                 URL: https://issues.apache.org/jira/browse/RATIS-2156
>             Project: Ratis
>          Issue Type: Improvement
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>         Attachments: image-2024-09-13-18-54-04-203.png
>
>
> Currently the StateMachine.LeaderEventApi#notifyFollowerSlowness is based on 
> raft.server.rpc.slowness.timeout, we saw that sometimes there are some cases 
> where the rpc rtt between the leader and follower does not exceed the 
> timeout, the difference of the log index between the leader and follower 
> keeps increasing, i.e. the slow follower cannot catch up.
> In Ozone, this causes most watch request with ALL_COMMITTED replication to 
> timeout, causing increased latency of writes. It is better to close the 
> pipeline if the slow follower cannot catch up.
> !image-2024-09-13-18-54-04-203.png|width=1408,height=244!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (RATIS-2156) Notify follower slowness based on the log index

Reply via email to