[ 
https://issues.apache.org/jira/browse/IGNITE-15157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495714#comment-17495714
 ] 

Vladislav Pyatkov commented on IGNITE-15157:
--------------------------------------------

In the last fail (which was attached), the reason of it is changing leader:
{noformat}
[12:33:26]W:             [org.apache.ignite:ignite-raft] 2022-02-02 
12:33:16:294 +0300 [WARNING][%172.17.0.5:5004%JRaft-StepDownTimer-3][NodeImpl] 
Node <CliServiceTest/172.17.0.5:5004> steps down when alive nodes don't satisfy 
quorum, term=1, deadNodes=172.17.0.5:5003,172.17.0.5:5005, 
conf=172.17.0.5:5003,172.17.0.5:5004,172.17.0.5:5005,172.17.0.5:5103/learner,172.17.0.5:5104/learner.
[12:33:26]W:             [org.apache.ignite:ignite-raft] 2022-02-02 
12:33:16:294 +0300 
[INFO][%172.17.0.5:5004%JRaft-FSMCaller-Disruptor-_stripe_57-0][StateMachineAdapter]
 onLeaderStop: status=Status[ERAFTTIMEDOUT<10001>: Majority of the group dies: 
2/3].
{noformat}
Possibility this happens because an election timeout is 300ms (default is 1 
second) for the test cluster:
{code}
cluster = new TestCluster(groupId, dataPath.toString(), peers, learners, 300, 
testInfo);
{code}
Also, in the log I saw a hole on output in 350ms:
{noformat}
[12:33:26]W:             [org.apache.ignite:ignite-raft] 2022-02-02 
12:33:15:943 +0300 
[INFO][%172.17.0.5:5004%JRaft-FSMCaller-Disruptor-_stripe_57-0][Replicator] 
Replicator Replicator [state=Replicate, statInfo=<running=IDLE, 
firstLogIndex=25, lastLogIncluded=0, lastLogIndex=25, lastTermIncluded=0>, 
peerId=172.17.0.5:5006, type=Follower] is going to quit
[12:33:26]W:             [org.apache.ignite:ignite-raft] 2022-02-02 
12:33:16:294 +0300 [INFO][%172.17.0.5:5003%JRaft-ElectionTimer-3][NodeImpl] 
Node <CliServiceTest/172.17.0.5:5003> term 1 start preVote.
{noformat}
I think a GC pause is took a place here.
I increased election timeout to default and unmuted the test.

> ITCliServiceTest.testAddPeerRemovePeer is flaky
> -----------------------------------------------
>
>                 Key: IGNITE-15157
>                 URL: https://issues.apache.org/jira/browse/IGNITE-15157
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Ivan Bessonov
>            Assignee: Vladislav Pyatkov
>            Priority: Major
>              Labels: ignite-3
>         Attachments: _Integration_Tests_Module_Raft_3469.log.zip
>
>
> [https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_IntegrationTests_IntegrationTests/6094143]
> {code:java}
> [09:50:48]W:  [Step 2/2] [ERROR] 
> org.apache.ignite.raft.jraft.core.ITCliServiceTest.testAddPeerRemovePeer  
> Time elapsed: 22.237 s  <<< FAILURE![09:50:48]W:  [Step 2/2] [ERROR] 
> org.apache.ignite.raft.jraft.core.ITCliServiceTest.testAddPeerRemovePeer  
> Time elapsed: 22.237 s  <<< FAILURE![09:50:48] :  [Step 2/2] 
> org.opentest4j.AssertionFailedError: expected: <true> but was: 
> <false>[09:50:48] :  [Step 2/2]  at 
> org.apache.ignite.raft.jraft.core.ITCliServiceTest.testAddPeerRemovePeer(ITCliServiceTest.java:273)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to