[ 
https://issues.apache.org/jira/browse/IGNITE-23780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mirza Aliev reassigned IGNITE-23780:
------------------------------------

    Assignee:  Kirill Sizov

> Node restart behaviour for HA mode
> ----------------------------------
>
>                 Key: IGNITE-23780
>                 URL: https://issues.apache.org/jira/browse/IGNITE-23780
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Mirza Aliev
>            Assignee:  Kirill Sizov
>            Priority: Major
>              Labels: ignite-3
>
> h3. Motivation
> According to 
> [IEP-131|https://cwiki.apache.org/confluence/display/IGNITE/IEP-131%3A+Partition+Majority+Unavailability+Handling]
>  when node is going to start/restart, for HA partitions we must decide if we 
> start or do not not start raft group on this node based on the info about 
> stable, pending assignments received from recovered metastorage.
> Below you can see combinations of stable, pending (forced or not), info if 
> node is presented in stable, pending, or both, and acton on restart (start 
> raft group or not)
> Full table with all combinations and repetitions 
> || # || stable || pending || in stable || in pending || in both || action ||
> | 1 | empty | empty | no | no | no | nothing |
> | 2 | empty | exists | no | no | no | nothing |
> | 3 | empty | forced | no | no | no | nothing |
> | 4 | exists | empty | yes | no | no | nothing |
> | 5 | exists | exists | yes | no | no | nothing |
> | 6 | exists | forced | yes | no | no | nothing |
> | 7 | empty | empty | no | no | no | nothing |
> | 8 | empty | exists | no | yes | no | start |
> | 9 | empty | forced | no | yes | no | start |
> | 10 | exists | empty | no | no | no | stop |
> | 11 | exists | exists | no | yes | no | start |
> | 12 | exists | forced | no | yes | no | start |
> | 13 | empty | empty | no | no | no | nothing |
> | 14 | empty | exists | no | no | no | nothing |
> | 15 | empty | forced | no | no | no | nothing |
> | 16 | exists | empty | no | no | no | nothing |
> | 17 | exists | exists | yes | yes | yes | nothing |
> | 18 | exists | forced | yes | yes | yes | nothing |
> | 19 | empty | empty | no | no | no | nothing |
> | 20 | empty | exists | no | no | no | nothing |
> | 21 | empty | forced | no | no | no | nothing |
> | 22 | exists | empty | no | no | no | nothing |
> | 23 | exists | exists | no | no | no | stop |
> | 24 | exists | forced | no | no | no | stop |
> Improved table, without repetitions: 
> || # || stable || pending || in stable || in pending || in both || on restart 
> ||
> | 1 | empty | empty | no | no | no | nothing |
> | 2 | empty | exists | no | no | no | nothing |
> | 3 | empty | forced | no | no | no | nothing |
> | 4 | exists | empty | yes | no | no | start |
> | 5 | exists | exists | yes | no | no | start |
> | 6 | exists | forced | yes | no | no | nothing |
> | 7 | empty | exists | no | yes | no | start |
> | 8 | empty | forced | no | yes | no | start |
> | 9 | exists | empty | no | no | no | nothing |
> | 10 | exists | exists | no | yes | no | start |
> | 11 | exists | forced | no | yes | no | start |
> | 12 | exists | exists | yes | yes | yes | start |
> | 13 | exists | forced | yes | yes | yes | start |
> | 14 | exists | exists | no | no | no | nothing |
> | 15 | exists | forced | no | no | no | nothing |
> We have an invariant, that if the node is in a stable, but not in a forced 
> pending, raft on that node should not be started. This is because of this 
> example:
> 1) stable = [A, B, C]
> 2) pending = [A, force = true]
> 3) Rebalance happened, but stable switch is not happened, and user has 
> entered some data to A
> 4) full restart
> 5) we cannot start raft nodes on B and C based on stable, because we will 
> lose data on A on the step 3
> Me wth [~jakutenshi] independently formulated conditions for a node to 
> decide, should it start raft group or not, developer of this ticket is 
> responsible to choose more appropriate condition:
> {code:java}
> if (
>       (stable.contains(node) && (force && pending.contains(node))) ||
>       (stable.contains(node) && (!force)) ||
>       pending.contains(node)  
> ) {
>       start node
> }
> {code}
> {code:java}
> stable.contains(node)
>   && !(pending.contains(node) || peinding.isForce())
>   || pending.contains(node)
> {code}
> h3. Implementation notes
> Aforementioned condition must be integrated to 
> {{TableManager#startPartitionAndStartClient}} in case {{boolean isRecovery == 
> true}}
> h3. Definition of done
> * Node correctly decide should it start raft group or not based on MS 
> assignments keys 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to