[
https://issues.apache.org/jira/browse/IGNITE-23708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mikhail Efremov updated IGNITE-23708:
-------------------------------------
Summary: testAutomaticRebalanceIfMajorityIsLost is flaky (was:
testAutomaticRebalanceIfMajorityIsLost is flacky)
> testAutomaticRebalanceIfMajorityIsLost is flaky
> -----------------------------------------------
>
> Key: IGNITE-23708
> URL: https://issues.apache.org/jira/browse/IGNITE-23708
> Project: Ignite
> Issue Type: Bug
> Reporter: Mikhail Efremov
> Assignee: Mikhail Efremov
> Priority: Major
> Labels: ignite-3
> Time Spent: 10m
> Remaining Estimate: 0h
>
> *Description*
> During IGNITE-22036 there was found an issue with
> {{ItDisasterRecoveryReconfigurationTest#testAutomaticRebalanceIfMajorityIsLost}}
> test. It flacks in two possible places:
> # On {{assertRealAssignments(node0, partId, 1)}} check with {{[0, 1, 2]}}
> actual result. The reason is the previous unfinished (as expected because
> majority of {{[1, 3, 4]}} is lost after 3 and 4 are stopped) rebalance that
> was triggired by scale-down timer starts new replication groups with the
> corresponding partition and the follow force reset rebalance with {{[1]}}
> couldn't finished before the assertion.
> # On {{assertNull(getPendingAssignments(node0, partId))}} we still could have
> non-null pendings because there were non-forced planned assignments equals to
> {{[1]}} from partitions reset, and event if p.1 force-reset is done, the
> non-forced assignment rebalance may also be late a little.
> The solution for 1 is just to increase the timeout inside
> {{assertRealAssignments}}. The solution for 2 is to check if reset
> assignments and planned equals -- then left the last as {{null}}, because
> there no any need in de facto the same reabalance. The test highlights this
> drawback in the implementation and it should be fixed.
> *Motivation*
> There shouldn't be any flacky tests and also the implementation is flawed and
> should be fixed.
> *Definition of Done*
> # {{assertRealAssignments}}'s timeout is increased from 2000ms up to 5000ms.
> # Inside {{GroupUpdateRequest#partitionUpdate}} for nodes alive case we
> should check if {{partAssignments =}} stableAssignments= and then put
> {{null}} as planned assignments instead of {{partAssignments}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)