[ 
https://issues.apache.org/jira/browse/IGNITE-25804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18003529#comment-18003529
 ] 

Alexander Lapin edited comment on IGNITE-25804 at 7/7/25 2:03 PM:
------------------------------------------------------------------

Likely the reason of the failure is the race between doStableSwitch processing 
on former leader and new leader onLeaderElected + newPendingAssignments 
processing.

 
{code:java}
[2025-07-04T16:47:22,041][INFO 
][%irbpaqt_tnrdqp_3346%rebalance-scheduler-0][RaftGroupServiceImpl] Sending 
changePeersAndLearnersAsync request for group=20_part_0 to peers=[Peer 
[consistentId=irbpaqt_tnrdqp_3347, idx=0], Peer 
[consistentId=irbpaqt_tnrdqp_3344, idx=0], Peer 
[consistentId=irbpaqt_tnrdqp_3345, idx=0]] and learners=[] with leader term=1
        [2025-07-04T16:47:22,042][INFO 
][%irbpaqt_tnrdqp_3346%JRaft-Request-Processor-10][BaseCliRequestProcessor] 
Receive ChangePeersAndLearnersAsyncRequest with term 1 to 
<20_part_0/irbpaqt_tnrdqp_3346> from 127.0.0.1:3346, new conf is 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3345
        [2025-07-04T16:47:22,410][INFO 
][%irbpaqt_tnrdqp_3346%JRaft-Common-Executor-3][NodeImpl] Catch up phase to 
change peers from=[irbpaqt_tnrdqp_3347, irbpaqt_tnrdqp_3344, 
irbpaqt_tnrdqp_3346] to=[irbpaqt_tnrdqp_3347, irbpaqt_tnrdqp_3344, 
irbpaqt_tnrdqp_3345] was successfully finished               
[2025-07-04T16:47:22,411][INFO 
][%irbpaqt_tnrdqp_3346%JRaft-FSMCaller-Disruptor_stripe_0-0][StateMachineAdapter]
 onConfigurationCommitted: 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3345.         
[2025-07-04T16:47:22,411][INFO 
][%irbpaqt_tnrdqp_3347%JRaft-FSMCaller-Disruptor_stripe_0-0][StateMachineAdapter]
 onConfigurationCommitted: 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3345.         
[2025-07-04T16:47:22,411][INFO 
][%irbpaqt_tnrdqp_3344%JRaft-FSMCaller-Disruptor_stripe_0-0][StateMachineAdapter]
 onConfigurationCommitted: 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3345.

// Spoiled changePeers[2025-07-04T16:47:22,425][INFO 
][%irbpaqt_tnrdqp_3347%rebalance-scheduler-0][RaftGroupServiceImpl] Sending 
changePeersAndLearnersAsync request for group=20_part_0 to peers=[Peer 
[consistentId=irbpaqt_tnrdqp_3347, idx=0], Peer 
[consistentId=irbpaqt_tnrdqp_3344, idx=0], Peer 
[consistentId=irbpaqt_tnrdqp_3345, idx=0]] and learners=[] with leader term=2


[2025-07-04T16:47:22,676][INFO 
][%irbpaqt_tnrdqp_3347%MessagingService-inbound-Default-0-0][RaftGroupServiceImpl]
 Sending changePeersAndLearnersAsync request for group=20_part_0 to peers=[Peer 
[consistentId=irbpaqt_tnrdqp_3347, idx=0], Peer 
[consistentId=irbpaqt_tnrdqp_3344, idx=0], Peer 
[consistentId=irbpaqt_tnrdqp_3345, idx=0]] and learners=[Peer 
[consistentId=irbpaqt_tnrdqp_3346, idx=0]] with leader term=2            
[2025-07-04T16:47:22,676][INFO 
][%irbpaqt_tnrdqp_3347%JRaft-Request-Processor-16][BaseCliRequestProcessor] 
Receive ChangePeersAndLearnersAsyncRequest with term 2 to 
<20_part_0/irbpaqt_tnrdqp_3347> from 127.0.0.1:3347, new conf is 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3345,irbpaqt_tnrdqp_3346/learner
                [2025-07-04T16:47:22,677][INFO 
][%irbpaqt_tnrdqp_3347%JRaft-Request-Processor-16][NodeImpl] Catch up phase to 
change peers from=[irbpaqt_tnrdqp_3347, irbpaqt_tnrdqp_3344, 
irbpaqt_tnrdqp_3345] to=[irbpaqt_tnrdqp_3347, irbpaqt_tnrdqp_3344, 
irbpaqt_tnrdqp_3345] was successfully finished            // Learners part is 
likely missing.


// 3347, 3344[2025-07-04T16:47:22,739][INFO 
][%irbpaqt_tnrdqp_3347%MessagingService-inbound-Default-0-0][RaftGroupServiceImpl]
 Sending changePeersAndLearnersAsync request for group=20_part_0 to peers=[Peer 
[consistentId=irbpaqt_tnrdqp_3347, idx=0], Peer 
[consistentId=irbpaqt_tnrdqp_3344, idx=0]] and learners=[] with leader term=2
        [2025-07-04T16:47:22,739][INFO 
][%irbpaqt_tnrdqp_3347%JRaft-Request-Processor-0][BaseCliRequestProcessor] 
Receive ChangePeersAndLearnersAsyncRequest with term 2 to 
<20_part_0/irbpaqt_tnrdqp_3347> from 127.0.0.1:3347, new conf is 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344
                [2025-07-04T16:47:22,740][INFO 
][%irbpaqt_tnrdqp_3347%JRaft-Request-Processor-0][NodeImpl] Catch up phase to 
change peers from=[irbpaqt_tnrdqp_3347, irbpaqt_tnrdqp_3344, 
irbpaqt_tnrdqp_3345] to=[irbpaqt_tnrdqp_3347, irbpaqt_tnrdqp_3344] was 
successfully finished                                          
[2025-07-04T16:47:22,740][INFO 
][%irbpaqt_tnrdqp_3347%JRaft-FSMCaller-Disruptor_stripe_0-0][StateMachineAdapter]
 onConfigurationCommitted: irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344.
// 3347, 3344, 3346[2025-07-04T16:47:22,786][INFO 
][%irbpaqt_tnrdqp_3347%MessagingService-inbound-Default-0-0][RaftGroupServiceImpl]
 Sending changePeersAndLearnersAsync request for group=20_part_0 to peers=[Peer 
[consistentId=irbpaqt_tnrdqp_3347, idx=0], Peer 
[consistentId=irbpaqt_tnrdqp_3344, idx=0], Peer 
[consistentId=irbpaqt_tnrdqp_3346, idx=0]] and learners=[] with leader term=2
        [2025-07-04T16:47:22,786][INFO 
][%irbpaqt_tnrdqp_3347%JRaft-Request-Processor-1][BaseCliRequestProcessor] 
Receive ChangePeersAndLearnersAsyncRequest with term 2 to 
<20_part_0/irbpaqt_tnrdqp_3347> from 127.0.0.1:3347, new conf is 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3346        
                [2025-07-04T16:47:22,786][INFO 
][%irbpaqt_tnrdqp_3347%JRaft-Common-Executor-0][NodeImpl] Catch up phase to 
change peers from=[irbpaqt_tnrdqp_3347, irbpaqt_tnrdqp_3344] 
to=[irbpaqt_tnrdqp_3347, irbpaqt_tnrdqp_3344, irbpaqt_tnrdqp_3346] was 
successfully finished                                            
[2025-07-04T16:47:22,787][INFO 
][%irbpaqt_tnrdqp_3347%JRaft-FSMCaller-Disruptor_stripe_0-0][StateMachineAdapter]
 onConfigurationCommitted: 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3346.                 
[2025-07-04T16:47:22,787][INFO 
][%irbpaqt_tnrdqp_3346%JRaft-FSMCaller-Disruptor_stripe_0-0][StateMachineAdapter]
 onConfigurationCommitted: 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3346.                 
[2025-07-04T16:47:22,787][INFO 
][%irbpaqt_tnrdqp_3344%JRaft-FSMCaller-Disruptor_stripe_0-0][StateMachineAdapter]
 onConfigurationCommitted: 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3346.
// Delayed spoiled changePeers prcessing.       [2025-07-04T16:47:23,837][INFO 
][%irbpaqt_tnrdqp_3347%JRaft-Request-Processor-17][BaseCliRequestProcessor] 
Receive ChangePeersAndLearnersAsyncRequest with term 2 to 
<20_part_0/irbpaqt_tnrdqp_3347> from 127.0.0.1:3347, new conf is 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3345               
[2025-07-04T16:47:24,365][INFO 
][%irbpaqt_tnrdqp_3347%MessagingService-inbound-Default-0-0][RaftGroupServiceImpl]
 Sending changePeersAndLearnersAsync request for group=20_part_0 to peers=[Peer 
[consistentId=irbpaqt_tnrdqp_3347, idx=0], Peer 
[consistentId=irbpaqt_tnrdqp_3344, idx=0], Peer 
[consistentId=irbpaqt_tnrdqp_3346, idx=0]] and learners=[Peer 
[consistentId=irbpaqt_tnrdqp_3345, idx=0]] with leader term=2
        [2025-07-04T16:47:24,366][INFO 
][%irbpaqt_tnrdqp_3347%JRaft-Request-Processor-2][BaseCliRequestProcessor] 
Receive ChangePeersAndLearnersAsyncRequest with term 2 to 
<20_part_0/irbpaqt_tnrdqp_3347> from 127.0.0.1:3347, new conf is 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3346,irbpaqt_tnrdqp_3345/learner
                [2025-07-04T16:47:24,366][WARN 
][%irbpaqt_tnrdqp_3347%JRaft-Request-Processor-2][NodeImpl] Node 
<20_part_0/irbpaqt_tnrdqp_3347> refused configuration concurrent changing.

// Переехали на левую конфигурацию              [2025-07-04T16:47:24,450][INFO 
][%irbpaqt_tnrdqp_3347%JRaft-FSMCaller-Disruptor_stripe_0-0][StateMachineAdapter]
 onConfigurationCommitted: 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3345.         
[2025-07-04T16:47:24,450][INFO 
][%irbpaqt_tnrdqp_3346%JRaft-FSMCaller-Disruptor_stripe_0-0][StateMachineAdapter]
 onConfigurationCommitted: 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3345.         
[2025-07-04T16:47:24,451][INFO 
][%irbpaqt_tnrdqp_3344%JRaft-FSMCaller-Disruptor_stripe_0-0][StateMachineAdapter]
 onConfigurationCommitted: 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3345.
// Странно, возможно реордеринг в логах         [2025-07-04T16:47:24,451][INFO 
][%irbpaqt_tnrdqp_3345%JRaft-FSMCaller-Disruptor_stripe_1-0][StateMachineAdapter]
 onConfigurationCommitted: 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3345.         
[2025-07-04T16:47:24,451][INFO 
][%irbpaqt_tnrdqp_3345%JRaft-FSMCaller-Disruptor_stripe_1-0][StateMachineAdapter]
 onConfigurationCommitted: irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344.
                [2025-07-04T16:47:24,451][INFO 
][%irbpaqt_tnrdqp_3345%JRaft-FSMCaller-Disruptor_stripe_1-0][StateMachineAdapter]
 onConfigurationCommitted: 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3346.         
[2025-07-04T16:47:24,451][INFO 
][%irbpaqt_tnrdqp_3345%JRaft-FSMCaller-Disruptor_stripe_1-0][StateMachineAdapter]
 onConfigurationCommitted: 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3345.
                        
[2025-07-04T16:47:24,568][ERROR][%irbpaqt_tnrdqp_3347%JRaft-Request-Processor-6][RpcRequestProcessor]
 handleRequest ChangePeersAndLearnersAsyncRequestImpl [groupId=20_part_0, 
leaderId=irbpaqt_tnrdqp_3347, newLearnersList=ArrayList [irbpaqt_tnrdqp_3345], 
newPeersList=ArrayList [irbpaqt_tnrdqp_3347, irbpaqt_tnrdqp_3344, 
irbpaqt_tnrdqp_3346], term=2] failed
        [2025-07-04T16:47:24,567][INFO 
][%irbpaqt_tnrdqp_3347%JRaft-Request-Processor-6][BaseCliRequestProcessor] 
Receive ChangePeersAndLearnersAsyncRequest with term 2 to 
<20_part_0/irbpaqt_tnrdqp_3347> from 127.0.0.1:3347, new conf is 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3346,irbpaqt_tnrdqp_3345/learner
                            
[2025-07-04T16:47:24,568][ERROR][%irbpaqt_tnrdqp_3347%JRaft-Request-Processor-6][ConfigurationEntry]
 Invalid conf entry ConfigurationEntry [id=null, 
conf=irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3346,irbpaqt_tnrdqp_3345/learner,
 oldConf=irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3345], peers 
and learners have intersection: [irbpaqt_tnrdqp_3345].
java.lang.IllegalArgumentException: Invalid conf entry: 
irbpaqt_tnrdqp_3347,irbpaqt_tnrdqp_3344,irbpaqt_tnrdqp_3346,irbpaqt_tnrdqp_3345/learner
 {code}


was (Author: alapin):
Likely the reason of the failure is the race between doStableSwitch processing 
on former leader and new leader onLeaderElected + newPendingAssignments 
processing.

> ItRebalanceByPendingAssignmentsQueueTest#testNodeRestartDuringQueueProcessing 
> is flaky
> --------------------------------------------------------------------------------------
>
>                 Key: IGNITE-25804
>                 URL: https://issues.apache.org/jira/browse/IGNITE-25804
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexander Lapin
>            Assignee: Alexander Lapin
>            Priority: Major
>              Labels: MakeTeamcityGreenAgain, ignite-3
>
> {code:java}
> Expected: <[irbpaqt_tnrdqp_3347, irbpaqt_tnrdqp_3346, irbpaqt_tnrdqp_3345, 
> irbpaqt_tnrdqp_3344]>
>      but: was <[irbpaqt_tnrdqp_3347, irbpaqt_tnrdqp_3346, 
> irbpaqt_tnrdqp_3344]>
> java.lang.AssertionError: 
> Expected: <[irbpaqt_tnrdqp_3347, irbpaqt_tnrdqp_3346, irbpaqt_tnrdqp_3345, 
> irbpaqt_tnrdqp_3344]>
>      but: was <[irbpaqt_tnrdqp_3347, irbpaqt_tnrdqp_3346, 
> irbpaqt_tnrdqp_3344]>
>     at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>     at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
>     at 
> org.apache.ignite.internal.rebalance.ItRebalanceByPendingAssignmentsQueueTest.lambda$testNodeRestartDuringQueueProcessing$12(ItRebalanceByPendingAssignmentsQueueTest.java:307)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to