[
https://issues.apache.org/jira/browse/IGNITE-15157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495714#comment-17495714
]
Vladislav Pyatkov commented on IGNITE-15157:
--------------------------------------------
In the last fail (which was attached), the reason of it is changing leader:
{noformat}
[12:33:26]W: [org.apache.ignite:ignite-raft] 2022-02-02
12:33:16:294 +0300 [WARNING][%172.17.0.5:5004%JRaft-StepDownTimer-3][NodeImpl]
Node <CliServiceTest/172.17.0.5:5004> steps down when alive nodes don't satisfy
quorum, term=1, deadNodes=172.17.0.5:5003,172.17.0.5:5005,
conf=172.17.0.5:5003,172.17.0.5:5004,172.17.0.5:5005,172.17.0.5:5103/learner,172.17.0.5:5104/learner.
[12:33:26]W: [org.apache.ignite:ignite-raft] 2022-02-02
12:33:16:294 +0300
[INFO][%172.17.0.5:5004%JRaft-FSMCaller-Disruptor-_stripe_57-0][StateMachineAdapter]
onLeaderStop: status=Status[ERAFTTIMEDOUT<10001>: Majority of the group dies:
2/3].
{noformat}
Possibility this happens because an election timeout is 300ms (default is 1
second) for the test cluster:
{code}
cluster = new TestCluster(groupId, dataPath.toString(), peers, learners, 300,
testInfo);
{code}
Also, in the log I saw a hole on output in 350ms:
{noformat}
[12:33:26]W: [org.apache.ignite:ignite-raft] 2022-02-02
12:33:15:943 +0300
[INFO][%172.17.0.5:5004%JRaft-FSMCaller-Disruptor-_stripe_57-0][Replicator]
Replicator Replicator [state=Replicate, statInfo=<running=IDLE,
firstLogIndex=25, lastLogIncluded=0, lastLogIndex=25, lastTermIncluded=0>,
peerId=172.17.0.5:5006, type=Follower] is going to quit
[12:33:26]W: [org.apache.ignite:ignite-raft] 2022-02-02
12:33:16:294 +0300 [INFO][%172.17.0.5:5003%JRaft-ElectionTimer-3][NodeImpl]
Node <CliServiceTest/172.17.0.5:5003> term 1 start preVote.
{noformat}
I think a GC pause is took a place here.
I increased election timeout to default and unmuted the test.
> ITCliServiceTest.testAddPeerRemovePeer is flaky
> -----------------------------------------------
>
> Key: IGNITE-15157
> URL: https://issues.apache.org/jira/browse/IGNITE-15157
> Project: Ignite
> Issue Type: Bug
> Reporter: Ivan Bessonov
> Assignee: Vladislav Pyatkov
> Priority: Major
> Labels: ignite-3
> Attachments: _Integration_Tests_Module_Raft_3469.log.zip
>
>
> [https://ci.ignite.apache.org/buildConfiguration/ignite3_Test_IntegrationTests_IntegrationTests/6094143]
> {code:java}
> [09:50:48]W: [Step 2/2] [ERROR]
> org.apache.ignite.raft.jraft.core.ITCliServiceTest.testAddPeerRemovePeer
> Time elapsed: 22.237 s <<< FAILURE![09:50:48]W: [Step 2/2] [ERROR]
> org.apache.ignite.raft.jraft.core.ITCliServiceTest.testAddPeerRemovePeer
> Time elapsed: 22.237 s <<< FAILURE![09:50:48] : [Step 2/2]
> org.opentest4j.AssertionFailedError: expected: <true> but was:
> <false>[09:50:48] : [Step 2/2] at
> org.apache.ignite.raft.jraft.core.ITCliServiceTest.testAddPeerRemovePeer(ITCliServiceTest.java:273)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)