[
https://issues.apache.org/jira/browse/IGNITE-27905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mirza Aliev updated IGNITE-27905:
---------------------------------
Description:
{noformat}
org.opentest4j.AssertionFailedError: expected: <true> but was: <false>
at
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:31)
at app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:183)
at
app//org.apache.ignite.raft.jraft.core.ItNodeTest.testSetPeer2(ItNodeTest.java:2087){noformat}
TeamCity:
[https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3x_Test_IntegrationTests_Raft/10064428]
*The root cause: *
When {{cluster.start(peer, emptyPeers=true, ...)}} is called to restart a
follower, the emptyPeers flag was controlling both the Raft initial
configuration and the ScaleCube gossip seed members:
{code:java}
List<NetworkAddress> addressList = List.of(); // always empty seeds
if (!emptyPeers) {
addressList = ...all peers...; // seeds only set when NOT
emptyPeers
nodeOptions.setInitialConf(...); // raft conf only set when NOT
emptyPeers
}
{code}
The intent of {{emptyPeers=true}} is to restart a node without a pre-configured
Raft peer list (so it doesn't form a rogue cluster using stale conf). But it
accidentally also cleared the ScaleCube seed members, leaving restarted nodes
with no way to discover the existing cluster. With no seeds, discovery relied
on a probabilistic ScaleCube suspicion-probe race -- sometimes it worked,
sometimes it didn't, making the test flaky.
*Fix:*
Decouple the two concerns -- seeds always use all cluster addresses; Raft conf
is only set when {{!emptyPeers}}:
{code:java}
// Some comments here
// Always provide seeds so gossip discovery works after restart
List<NetworkAddress> addressList = Stream.concat(peers.stream(),
learners.stream())
.map(p -> new NetworkAddress(TestUtils.getLocalAddress(),
p.getPort()))
.collect(toList());
if (!emptyPeers) {
nodeOptions.setInitialConf(...); // Raft conf still skipped when
emptyPeers=true
}
{code}
was:
{noformat}
org.opentest4j.AssertionFailedError: expected: <true> but was: <false>
at
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:31)
at app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:183)
at
app//org.apache.ignite.raft.jraft.core.ItNodeTest.testSetPeer2(ItNodeTest.java:2087){noformat}
TeamCity:
https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3x_Test_IntegrationTests_Raft/10064428
> ItNodeTest#testSetPeer2 is flaky
> --------------------------------
>
> Key: IGNITE-27905
> URL: https://issues.apache.org/jira/browse/IGNITE-27905
> Project: Ignite
> Issue Type: Bug
> Reporter: Vyacheslav Koptilin
> Assignee: Mirza Aliev
> Priority: Major
> Labels: MakeTeamcityGreenAgain, ignite-3
> Time Spent: 10m
> Remaining Estimate: 0h
>
> {noformat}
> org.opentest4j.AssertionFailedError: expected: <true> but was: <false>
> at
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
> at
> app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
> at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
> at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
> at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:31)
> at app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:183)
> at
> app//org.apache.ignite.raft.jraft.core.ItNodeTest.testSetPeer2(ItNodeTest.java:2087){noformat}
> TeamCity:
> [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3x_Test_IntegrationTests_Raft/10064428]
>
> *The root cause: *
> When {{cluster.start(peer, emptyPeers=true, ...)}} is called to restart a
> follower, the emptyPeers flag was controlling both the Raft initial
> configuration and the ScaleCube gossip seed members:
> {code:java}
> List<NetworkAddress> addressList = List.of(); // always empty seeds
>
> if (!emptyPeers) {
> addressList = ...all peers...; // seeds only set when NOT
> emptyPeers
>
> nodeOptions.setInitialConf(...); // raft conf only set when
> NOT emptyPeers
>
> }
> {code}
> The intent of {{emptyPeers=true}} is to restart a node without a
> pre-configured Raft peer list (so it doesn't form a rogue cluster using stale
> conf). But it accidentally also cleared the ScaleCube seed members, leaving
> restarted nodes with no way to discover the existing cluster. With no seeds,
> discovery relied on a probabilistic ScaleCube suspicion-probe race --
> sometimes it worked, sometimes it didn't, making the test flaky.
>
>
> *Fix:*
> Decouple the two concerns -- seeds always use all cluster addresses; Raft
> conf is only set when {{!emptyPeers}}:
>
> {code:java}
> // Some comments here
> // Always provide seeds so gossip discovery works after restart
>
>
> List<NetworkAddress> addressList = Stream.concat(peers.stream(),
> learners.stream())
>
> .map(p -> new NetworkAddress(TestUtils.getLocalAddress(),
> p.getPort()))
>
> .collect(toList());
>
>
>
>
>
> if (!emptyPeers) {
> nodeOptions.setInitialConf(...); // Raft conf still skipped when
> emptyPeers=true
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)