[ 
https://issues.apache.org/jira/browse/RATIS-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765711#comment-17765711
 ] 

Tsz-wo Sze commented on RATIS-1048:
-----------------------------------

[~burcukozkan], thanks a lots for testing Ratis! I somehow have overlooked this 
issue.

According to the logs below, s2 did become the leader at one point. However, 
not sure why the test failed at the end.
{code:java}
2020-08-30 21:11:56,836 INFO  impl.RaftServerImpl 
(ServerState.java:setLeader(255)) - s2@group-60D9F845C708: change Leader from 
null to s2 at term 2 for becomeLeader, leader elected after 724ms
...
2020-08-30 21:11:56,870 INFO  impl.RoleInfo (RoleInfo.java:updateAndGet(143)) - 
s2: start LeaderState
...
2020-08-30 21:11:56,913 INFO  impl.RaftServerImpl 
(ServerState.java:setLeader(255)) - s1@group-60D9F845C708: change Leader from 
null to s2 at term 2 for appendEntries, leader elected after 801ms
2020-08-30 21:11:56,914 INFO  impl.RaftServerImpl 
(ServerState.java:setLeader(255)) - s0@group-60D9F845C708: change Leader from 
null to s2 at term 2 for appendEntries, leader elected after 802ms
...
2020-08-30 21:11:57,236 INFO  ratis.RaftTestUtil 
(RaftTestUtil.java:waitForLeader(104)) - printing ALL groups
  s0:  RUNNING  FOLLOWER s0@group-60D9F845C708:t2, leader=s2, voted=s2, 
raftlog=s0@group-60D9F845C708-SegmentedRaftLog:OPENED:c0,f0,i0, conf=0: 
[s0:0.0.0.0:44625, s1:0.0.0.0:43949, s2:0.0.0.0:42577], old=null RUNNING
  s1:  RUNNING  FOLLOWER s1@group-60D9F845C708:t2, leader=s2, voted=s2, 
raftlog=s1@group-60D9F845C708-SegmentedRaftLog:OPENED:c0,f0,i0, conf=0: 
[s0:0.0.0.0:44625, s1:0.0.0.0:43949, s2:0.0.0.0:42577], old=null RUNNING
  s2:  RUNNING    LEADER s2@group-60D9F845C708:t2, leader=s2, voted=s2, 
raftlog=s2@group-60D9F845C708-SegmentedRaftLog:OPENED:c0,f0,i0, conf=0: 
[s0:0.0.0.0:44625, s1:0.0.0.0:43949, s2:0.0.0.0:42577], old=null RUNNING
{code}

> Ratis cluster fails to elect a leader when some messages are dropped
> --------------------------------------------------------------------
>
>                 Key: RATIS-1048
>                 URL: https://issues.apache.org/jira/browse/RATIS-1048
>             Project: Ratis
>          Issue Type: Bug
>            Reporter: Burcu Ozkan
>            Priority: Major
>         Attachments: logs.txt, msgs.txt
>
>
> I am testing fault tolerance of Ratis, more specifically whether it can 
> tolerate random message losses. Simply, I drop some of the messages and do 
> not deliver them to the recipient. 
> In some tests, I observe executions in which the Ratis servers cannot elect a 
> leader. The servers continuously start leader election but none of them 
> succeed.
> You can find the execution logs together with the list of exchanged and 
> dropped messages (the messages marked by "-D" are dropped) in the attachments.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to