[ 
https://issues.apache.org/jira/browse/IGNITE-27905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev updated IGNITE-27905:
---------------------------------
    Description: 
{noformat}
org.opentest4j.AssertionFailedError: expected: <true> but was: <false>
  at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
  at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
  at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
  at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
  at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:31)
  at app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:183)
  at 
app//org.apache.ignite.raft.jraft.core.ItNodeTest.testSetPeer2(ItNodeTest.java:2087){noformat}
TeamCity: 
[https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3x_Test_IntegrationTests_Raft/10064428]

 
*The root cause: *

When {{cluster.start(peer, emptyPeers=true, ...)}} is called to restart a 
follower, the emptyPeers flag was controlling both the Raft initial 
configuration and the ScaleCube gossip seed members:
{code:java}
  List<NetworkAddress> addressList = List.of();  // always empty seeds
  
  if (!emptyPeers) {
      addressList = ...all peers...;             // seeds only set when NOT 
emptyPeers                                                                      
                            

      nodeOptions.setInitialConf(...);           // raft conf only set when NOT 
emptyPeers                                                                      
                    
  }                      
{code}

The intent of {{emptyPeers=true}} is to restart a node without a pre-configured 
Raft peer list (so it doesn't form a rogue cluster using stale conf). But it 
accidentally also cleared the ScaleCube seed members, leaving restarted nodes 
with no way to discover the existing cluster. With no seeds, discovery relied 
on a probabilistic ScaleCube suspicion-probe race -- sometimes it worked, 
sometimes it didn't, making the test flaky.                                     
                                                                                
    

*Fix:* 
Decouple the two concerns -- seeds always use all cluster addresses; Raft conf 
is only set when {{!emptyPeers}}:                                               
                               
{code:java}
// Some comments here
  // Always provide seeds so gossip discovery works after restart               
                                                                                
                           
  List<NetworkAddress> addressList = Stream.concat(peers.stream(), 
learners.stream())                                                              
                   
          .map(p -> new NetworkAddress(TestUtils.getLocalAddress(), 
p.getPort()))                                                                   
                            
          .collect(toList());                                                   
                                                                                
                           
                                                                                
                                                                                
                          
  if (!emptyPeers) {
      nodeOptions.setInitialConf(...);  // Raft conf still skipped when 
emptyPeers=true
  } 
{code}

  was:
{noformat}
org.opentest4j.AssertionFailedError: expected: <true> but was: <false>
  at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
  at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
  at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
  at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
  at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:31)
  at app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:183)
  at 
app//org.apache.ignite.raft.jraft.core.ItNodeTest.testSetPeer2(ItNodeTest.java:2087){noformat}
TeamCity: 
https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3x_Test_IntegrationTests_Raft/10064428


> ItNodeTest#testSetPeer2 is flaky
> --------------------------------
>
>                 Key: IGNITE-27905
>                 URL: https://issues.apache.org/jira/browse/IGNITE-27905
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Vyacheslav Koptilin
>            Assignee: Mirza Aliev
>            Priority: Major
>              Labels: MakeTeamcityGreenAgain, ignite-3
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {noformat}
> org.opentest4j.AssertionFailedError: expected: <true> but was: <false>
>   at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
>   at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
>   at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
>   at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
>   at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:31)
>   at app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:183)
>   at 
> app//org.apache.ignite.raft.jraft.core.ItNodeTest.testSetPeer2(ItNodeTest.java:2087){noformat}
> TeamCity: 
> [https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3x_Test_IntegrationTests_Raft/10064428]
>  
> *The root cause: *
> When {{cluster.start(peer, emptyPeers=true, ...)}} is called to restart a 
> follower, the emptyPeers flag was controlling both the Raft initial 
> configuration and the ScaleCube gossip seed members:
> {code:java}
>   List<NetworkAddress> addressList = List.of();  // always empty seeds
>   
>   if (!emptyPeers) {
>       addressList = ...all peers...;             // seeds only set when NOT 
> emptyPeers                                                                    
>                               
>       nodeOptions.setInitialConf(...);           // raft conf only set when 
> NOT emptyPeers                                                                
>                           
>   }                      
> {code}
> The intent of {{emptyPeers=true}} is to restart a node without a 
> pre-configured Raft peer list (so it doesn't form a rogue cluster using stale 
> conf). But it accidentally also cleared the ScaleCube seed members, leaving 
> restarted nodes with no way to discover the existing cluster. With no seeds, 
> discovery relied on a probabilistic ScaleCube suspicion-probe race -- 
> sometimes it worked, sometimes it didn't, making the test flaky.              
>                                                                               
>                              
> *Fix:* 
> Decouple the two concerns -- seeds always use all cluster addresses; Raft 
> conf is only set when {{!emptyPeers}}:                                        
>                                       
> {code:java}
> // Some comments here
>   // Always provide seeds so gossip discovery works after restart             
>                                                                               
>                                
>   List<NetworkAddress> addressList = Stream.concat(peers.stream(), 
> learners.stream())                                                            
>                      
>           .map(p -> new NetworkAddress(TestUtils.getLocalAddress(), 
> p.getPort()))                                                                 
>                               
>           .collect(toList());                                                 
>                                                                               
>                                
>                                                                               
>                                                                               
>                               
>   if (!emptyPeers) {
>       nodeOptions.setInitialConf(...);  // Raft conf still skipped when 
> emptyPeers=true
>   } 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to