[ https://issues.apache.org/jira/browse/IGNITE-22801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Lapin updated IGNITE-22801: ------------------------------------- Description: h3. Motivation Both CMG and MG raft topology adjustment logic is broken because of topology adjustments re-ordering. >From the node local point of view following topology adjustment triggers # Add [B] as Learner -> resetLearners(B); # Add [B, C] as Learners -> resetLearners(B,C); may be reordered in a way that [B] will be applied after [B,C], thus node C won't be treated as learner and will never receive its portion of data. Worth mentioning that currently a node collocated with CMG leader/MG leader manages corresponding raft topology adjustment. That means that if node A believes that it's a leader collocated one it will send resetLearners, while in reality node B is the one that is collocated with a leader, thus it's possible to have distributed based reordering: # Add [B] as Learners -> resetLearners(B); # Remove [B] as Learner -> resetLearners(); # Add [B] as Learner -> resetLearners(B); Node A: resetLearners(B) -> is about to resetLearners() ... hangs Node D: resetLearners(B) -> resetLearners() -> resetLearners(B) Node A wakes up and sends resetLearners() which is incorrect, besides that it will never return node B back because A no longer believes that it's collocated with leader. Node local reorderings will be covered in corresponding dedicated tickets for CMG and MG. Within current one it's required to solve distributed reordering issue. h3. Definition of Done * Сonfiguration changes proposed by an old leader should be skipped. According to the current CMG/MG design new leader will catch up the process. h3. Implementation Notes * Basically it's required to add term to changePeers method like it's done for changePeersAsync. In case of mismatching term, configuration adjustment proposal should be skipped. * Worth mentioning that currently CMG uses resetLearners, however we've agreed to use changePeers instead. was: h3. Motivation Both CMG and MG raft topology adjustment logic is broken because of topology adjustments re-ordering. >From the node local point of view following topology adjustment triggers # Add [B] as Learner -> resetLearners(B); # Add [B, C] as Learners -> resetLearners(B,C); may be reordered in a way that [B] will be applied after [B,C], thus node C won't be treated as learner and will never receive its portion of data. Worth mentioning that currently a node collocated with CMG leader/MG leader manages corresponding raft topology adjustment. That means that if node A believes that it's a leader collocated one it will send resetLearners, while in reality node B is the one that is collocated with a leader, thus it's possible to have distributed based reordering: # Add [B] as Learners -> resetLearners(B); # Remove [B] as Learner -> resetLearners(); # Add [B] as Learner -> resetLearners(B); Node A: resetLearners(B) -> is about to resetLearners() ... hangs Node D: resetLearners(B) -> resetLearners() -> resetLearners(B) Node A wakes up and sends resetLearners() which is incorrect, besides that it will never return node B back because A no longer believes that it's collocated with leader. Node local reorderings will be covered in corresponding dedicated tickets for CMG and MG. Within current one it's required to solve distributed reordering issue. h3. Definition of Done * Сonfiguration changes proposed by an old leader should be skipped. According to the current CMG/MG design new leader will catch up the process. h3. Implementation Notes * Basically it's required to add term to changePeers method like it's done for changePeersAsync > Extend changePeers with term param in order to skip obsolete rebalance > ---------------------------------------------------------------------- > > Key: IGNITE-22801 > URL: https://issues.apache.org/jira/browse/IGNITE-22801 > Project: Ignite > Issue Type: Bug > Reporter: Alexander Lapin > Assignee: Alexander Lapin > Priority: Major > Labels: ignite-3 > > h3. Motivation > Both CMG and MG raft topology adjustment logic is broken because of topology > adjustments re-ordering. > From the node local point of view following topology adjustment triggers > # Add [B] as Learner -> resetLearners(B); > # Add [B, C] as Learners -> resetLearners(B,C); > may be reordered in a way that [B] will be applied after [B,C], thus node C > won't be treated as learner and will never receive its portion of data. > Worth mentioning that currently a node collocated with CMG leader/MG leader > manages corresponding raft topology adjustment. That means that if node A > believes that it's a leader collocated one it will send resetLearners, while > in reality node B is the one that is collocated with a leader, thus it's > possible to have distributed based reordering: > # Add [B] as Learners -> resetLearners(B); > # Remove [B] as Learner -> resetLearners(); > # Add [B] as Learner -> resetLearners(B); > Node A: resetLearners(B) -> is about to resetLearners() ... hangs > Node D: resetLearners(B) -> resetLearners() -> resetLearners(B) > Node A wakes up and sends resetLearners() which is incorrect, besides that it > will never return node B back because A no longer believes that it's > collocated with leader. > Node local reorderings will be covered in corresponding dedicated tickets for > CMG and MG. Within current one it's required to solve distributed reordering > issue. > h3. Definition of Done > * Сonfiguration changes proposed by an old leader should be skipped. > According to the current CMG/MG design new leader will catch up the process. > h3. Implementation Notes > * Basically it's required to add term to changePeers method like it's done > for changePeersAsync. In case of mismatching term, configuration adjustment > proposal should be skipped. > * Worth mentioning that currently CMG uses resetLearners, however we've > agreed to use changePeers instead. -- This message was sent by Atlassian Jira (v8.20.10#820010)