[ 
https://issues.apache.org/jira/browse/IGNITE-23823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Gusakov updated IGNITE-23823:
------------------------------------
    Description: 
*Motivation*
We have an issue with the choosing the right node which should be the target 
for reset, because of lack of the expected guarantees issues with the lastLogId 
check
* stable.assignments = [A(10),B(10),C(10),D(5),E(4)]. A(10) means, that node A 
has the lastLogId=10 
* Nodes A, B, C dies
* HA reset timer exhausted
* The node D(5) choosed as the target for reset from the [D(5),E(4)] list.
* Reset process is initiated:
  * stable.assignments = [A,B,C,D,E], pending.assignments=[D(5)] 
planned.assignments=[D,E].
* Everything is looking good, *but the E node actually can have potential 
infinite queue of unprocessed messages in the local queue*, and when it will 
process them, the index will be increased, for example to 6' and then to 7': 
E[7']
* So, after the first rebalance success we will have 
pending.assignmentes=[D(7),E(7')], stable.assignments=[D(6)]. The index of D 
increased by two because of 2 rebalance reconfigurations (joint configuration + 
target configuration entries). 

At this point we have a hidden data inconsistency between the 2 raft nodes, at 
the index 7.

So, the main problem - we think that after the lastLogId check on the broken 
group (no majority) we can't have lastLogId updates - but it is not true. Some 
messages still can be "in-flight" on the local raft node.

  was:
*Motivation*
We have an issue with the choosing the right node which should be the target 
for reset and further operation
* stable.assignments = [A(10),B(10),C(10),D(5),E(4)]. A(10) means, that node A 
has the lastLogId=10 
* Nodes A, B, C dies
* HA reset timer exhausted
* The node D(5) choosed as the target for reset from the [D(5),E(4)] list.
* Reset process is initiated:
  * stable.assignments = [A,B,C,D,E], pending.assignments=[D(5)] 
planned.assignments=[D,E].
* Everything is looking good, *but the E node actually can have potential 
infinite queue of unprocessed messages in the local queue*, and when it will 
process them, the index will be increased, for example to 6' and then to 7': 
E[7']
* So, after the first rebalance success we will have 
pending.assignmentes=[D(7),E(7')], stable.assignments=[D(6)]. The index of D 
increased by two because of 2 rebalance reconfigurations (joint configuration + 
target configuration entries). 

At this point we have a hidden data inconsistency between the 2 raft nodes, at 
the index 7.

So, the main problem - we think that after the lastLogId check on the broken 
group (no majority) we can't have lastLogId updates - but it is not true. Some 
messages still can be "in-flight" on the local raft node.


> Clean planned nodes on
> ----------------------
>
>                 Key: IGNITE-23823
>                 URL: https://issues.apache.org/jira/browse/IGNITE-23823
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Kirill Gusakov
>            Priority: Major
>
> *Motivation*
> We have an issue with the choosing the right node which should be the target 
> for reset, because of lack of the expected guarantees issues with the 
> lastLogId check
> * stable.assignments = [A(10),B(10),C(10),D(5),E(4)]. A(10) means, that node 
> A has the lastLogId=10 
> * Nodes A, B, C dies
> * HA reset timer exhausted
> * The node D(5) choosed as the target for reset from the [D(5),E(4)] list.
> * Reset process is initiated:
>   * stable.assignments = [A,B,C,D,E], pending.assignments=[D(5)] 
> planned.assignments=[D,E].
> * Everything is looking good, *but the E node actually can have potential 
> infinite queue of unprocessed messages in the local queue*, and when it will 
> process them, the index will be increased, for example to 6' and then to 7': 
> E[7']
> * So, after the first rebalance success we will have 
> pending.assignmentes=[D(7),E(7')], stable.assignments=[D(6)]. The index of D 
> increased by two because of 2 rebalance reconfigurations (joint configuration 
> + target configuration entries). 
> At this point we have a hidden data inconsistency between the 2 raft nodes, 
> at the index 7.
> So, the main problem - we think that after the lastLogId check on the broken 
> group (no majority) we can't have lastLogId updates - but it is not true. 
> Some messages still can be "in-flight" on the local raft node.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to