[ 
https://issues.apache.org/jira/browse/RATIS-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254716#comment-17254716
 ] 

runzhiwang edited comment on RATIS-1265 at 12/25/20, 3:18 AM:
--------------------------------------------------------------

[~szetszwo] I think when the server with highest priority, i.e. s0, reject vote 
to other server,  such as s1, s0 should become candidate and askForVote 
immediately, because s0 has already reject vote to s1, s1 can not win the 
leader, s0 start leader election has no bad effect. So that s0 can win the 
leader as soon as possible.  what do you think ?


was (Author: yjxxtd):
[~szetszwo] I think when the server with highest priority, i.e. s0, reject vote 
to other server,  such as s1, s0 should become candidate and askForVote 
immediately, because s0 has already reject vote to s1, s1 can not win the 
leader. So s0 can win the leader as soon as possible.  what do you think ?

> Fix leader election with priority too slow
> ------------------------------------------
>
>                 Key: RATIS-1265
>                 URL: https://issues.apache.org/jira/browse/RATIS-1265
>             Project: Ratis
>          Issue Type: Sub-task
>            Reporter: runzhiwang
>            Assignee: runzhiwang
>            Priority: Major
>         Attachments: leader_election_slow
>
>
> As the attached log shows, there are 3 servers: s0, s1, s2,  and s2 is the 
> leader, then we change s0 with the highest priority, so s2 will 
> yieldLeaderToHigherPriorityPeer(s0) when s0's log catch up. In 
> yieldLeaderToHigherPriorityPeer, s2 will step down.
> But when s2 step down,  which server will request vote is almost random, if 
> s0 can not request vote in a short time, the leader election will last a long 
> time.
> As the attached log shows, election happen 8 times and last 14 seconds, but 
> s0 only try start leader election at the 6th time, and can not get the 
> leadership.
> {code:java}
> 2020-12-25 10:11:34,995     s1: start s1@group-241716F733F8-LeaderElection2   
>        fail because s0 reject
> 2020-12-25 10:11:37,228      s2: start s2@group-241716F733F8-LeaderElection3  
>       fail because s0 reject
> 2020-12-25 10:11:39,345     s1: start s1@group-241716F733F8-LeaderElection4   
>       fail because s0 reject
> 2020-12-25 10:11:41,600      s1: start s1@group-241716F733F8-LeaderElection5  
>        fail because s0 reject
> 2020-12-25 10:11:43,710      s2: start s2@group-241716F733F8-LeaderElection6  
>       fail because s0 reject
> 2020-12-25 10:11:46,248     s0: start s0@group-241716F733F8-LeaderElection7   
>       fail because s1 start election after 200ms, s1's request vote arrives 
> s2 before s0, so s1 voted for itself and rejected s0 at 2020-12-25 
> 10:11:47,267, and s2 voted for s1 at 2020-12-25 10:11:46,469 and rejected s0 
> at 2020-12-25 10:11:47,267
> 2020-12-25 10:11:46,461      s1: start s1@group-241716F733F8-LeaderElection8  
>        fail because s0 reject
> 2020-12-25 10:11:48,597      s2: start s2@group-241716F733F8-LeaderElection9  
>       fail because s0 reject
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to