Mikhail Efremov created IGNITE-23708:
----------------------------------------
Summary: testAutomaticRebalanceIfMajorityIsLost is flacky
Key: IGNITE-23708
URL: https://issues.apache.org/jira/browse/IGNITE-23708
Project: Ignite
Issue Type: Bug
Reporter: Mikhail Efremov
Assignee: Mikhail Efremov
*Description*
During IGNITE-22036 there was found an issue with
=ItDisasterRecoveryReconfigurationTest#testAutomaticRebalanceIfMajorityIsLost=
test. It flacks in two possible places:
1. On =assertRealAssignments(node0, partId, 1)= check with =[0, 1, 2]= actual
result. The reason is the previous
unfinished (as expected because majority of =[1, 3, 4]= is lost after 3 and
4 are stopped) rebalance that was
triggired by scale-down timer starts new replication groups with the
corresponding partition and the follow force
reset rebalance with =[1]= couldn't finished before the assertion.
2. On =assertNull(getPendingAssignments(node0, partId))= we still could have
non-null pendings because there were
non-forced planned assignments equals to =[1]= from partitions reset, and
event if p.1 force-reset is done, the
non-forced assignment rebalance may also be late a little.
The solution for 1 is just to increase the timeout inside
=assertRealAssignments=. The solution for 2 is to check if
reset assignments and planned equals -- then left the last as =null=, because
there no any need in de facto the same
reabalance. The test highlights this drawback in the implementation and it
should be fixed.
*Motivation*
There shouldn't be any flacky tests and also the implementation is flawed and
should be fixed.
*Definition of Done*
1. =assertRealAssignments='s timeout is increased from 2000ms up to 5000ms.
2. Inside =GroupUpdateRequest#partitionUpdate= for nodes alive case we should
check if =partAssignments ==
stableAssignments= and then put =null= as planned assignments instead of
=partAssignments=
--
This message was sent by Atlassian Jira
(v8.20.10#820010)