[ 
https://issues.apache.org/jira/browse/IGNITE-22801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Lapin updated IGNITE-22801:
-------------------------------------
    Issue Type: Bug  (was: Improvement)

> Extend changePeers with term param in order to skip obsolete rebalance
> ----------------------------------------------------------------------
>
>                 Key: IGNITE-22801
>                 URL: https://issues.apache.org/jira/browse/IGNITE-22801
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexander Lapin
>            Assignee: Alexander Lapin
>            Priority: Major
>              Labels: ignite-3
>
> h3. Motivation
> Both CMG and MG raft topology adjustment logic is broken because of topology 
> adjustments re-ordering.
> From the node local point of view following topology adjustment triggers
>  # Add [B] as Learner -> resetLearners(B);
>  # Add [B, C] as Learners -> resetLearners(B,C);
> may be reordered in a way that [B] will be applied after [B,C], thus node C 
> won't be treated as learner and will never receive its portion of data.
> Worth mentioning that currently a node collocated with CMG leader/MG leader 
> manages corresponding raft topology adjustment. That means that if node A 
> believes that it's a leader collocated one it will send resetLearners, while 
> in reality node B is the one that is collocated with a leader, thus it's 
> possible to have distributed based reordering:
>  # Add [B] as Learners -> resetLearners(B);
>  # Remove [B] as Learner -> resetLearners();
>  # Add [B] as Learner -> resetLearners(B);
> Node A: resetLearners(B) -> is about to resetLearners() ... hangs
> Node D: resetLearners(B) -> resetLearners() -> resetLearners(B)
> Node A wakes up and sends resetLearners() which is incorrect, besides that it 
> will never return node B back because A no longer believes that it's 
> collocated with leader.
> Node local reorderings will be covered in corresponding dedicated tickets for 
> CMG and MG. Within current one it's required to solve distributed reordering 
> issue.
> h3. Definition of Done
>  * Сonfiguration changes proposed by an old leader should be skipped. 
> According to the current CMG/MG design new leader will catch up the process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to