[
https://issues.apache.org/jira/browse/IGNITE-22801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexander Lapin updated IGNITE-22801:
-------------------------------------
Issue Type: Bug (was: Improvement)
> Extend changePeers with term param in order to skip obsolete rebalance
> ----------------------------------------------------------------------
>
> Key: IGNITE-22801
> URL: https://issues.apache.org/jira/browse/IGNITE-22801
> Project: Ignite
> Issue Type: Bug
> Reporter: Alexander Lapin
> Assignee: Alexander Lapin
> Priority: Major
> Labels: ignite-3
>
> h3. Motivation
> Both CMG and MG raft topology adjustment logic is broken because of topology
> adjustments re-ordering.
> From the node local point of view following topology adjustment triggers
> # Add [B] as Learner -> resetLearners(B);
> # Add [B, C] as Learners -> resetLearners(B,C);
> may be reordered in a way that [B] will be applied after [B,C], thus node C
> won't be treated as learner and will never receive its portion of data.
> Worth mentioning that currently a node collocated with CMG leader/MG leader
> manages corresponding raft topology adjustment. That means that if node A
> believes that it's a leader collocated one it will send resetLearners, while
> in reality node B is the one that is collocated with a leader, thus it's
> possible to have distributed based reordering:
> # Add [B] as Learners -> resetLearners(B);
> # Remove [B] as Learner -> resetLearners();
> # Add [B] as Learner -> resetLearners(B);
> Node A: resetLearners(B) -> is about to resetLearners() ... hangs
> Node D: resetLearners(B) -> resetLearners() -> resetLearners(B)
> Node A wakes up and sends resetLearners() which is incorrect, besides that it
> will never return node B back because A no longer believes that it's
> collocated with leader.
> Node local reorderings will be covered in corresponding dedicated tickets for
> CMG and MG. Within current one it's required to solve distributed reordering
> issue.
> h3. Definition of Done
> * Сonfiguration changes proposed by an old leader should be skipped.
> According to the current CMG/MG design new leader will catch up the process.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)