[ 
https://issues.apache.org/jira/browse/IGNITE-18171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Mashenkov updated IGNITE-18171:
--------------------------------------
    Description: 
h2. Definitions.

We can distinguish next cluster node groups, see below. Each node may be part 
of one or more groups.
 * Cluster Management Group (CMG), that control new nodes join process.
 * MetaStorage group (MSG), that hosts meta storage.
 * Data node group (DNG), that just hosts tables partitions.

The components (CMG, meta storage, tables components) are depends on each 
other, but may resides on different (even disjoint) node subsets. So, some 
components may become temporary unavailable, and dependant components must be 
aware of such issues and handle them (wait, retry, throw exception or whatever) 
in expected way, which has to be documented also.
[See IEP for 
details|https://cwiki.apache.org/confluence/display/IGNITE/IEP-77%3A+Node+Join+Protocol+and+Initialization+for+Ignite+3]
h2. Motivation.

As of now, the correct way to start the grid (after it was stopped) is: start 
CMG nodes, then Meta Storage nodes, then Data nodes. And in backward order for 
correct stop. Other scenarios are not tested and may lead to unexpected 
behaviour.

Let's describe all possible scenarios, expected behaviour for each of them and 
extend test coverage.

 

h2. Results.
 * {_}Startup scenarios{_}, when nodes start in different order to check grid 
assembles and operates correctly. Seems, it make sense when CMG node start 
first, because grid can't be assembled in otherwise. 
 * {_}Restart scenarios{_}, when stop-then-start node of different roles on 
various grid configurations, and check services degradation/restoration. In 
contrast to {_}startup scenarios{_}, some grid configuration work different, 
e.g. grid without CMG node.
 * _Stop scenarios_ are covered in "restart scenarios."

  was:
h2. Definitions.

We can distinguish next cluster node groups, see below. Each node may be part 
of one or more groups.
 * Cluster Management Group (CMG), that control new nodes join process.
 * MetaStorage group (MSG), that hosts meta storage.
 * Data node group (DNG), that just hosts tables partitions.

The components (CMG, meta storage, tables components) are depends on each 
other, but may resides on different (even disjoint) node subsets. So, some 
components may become temporary unavailable, and dependant components must be 
aware of such issues and handle them (wait, retry, throw exception or whatever) 
in expected way, which has to be documented also.
[See IEP for 
details|https://cwiki.apache.org/confluence/display/IGNITE/IEP-77%3A+Node+Join+Protocol+and+Initialization+for+Ignite+3]
h2. Motivation.

As of now, the correct way to start the grid (after it was stopped) is: start 
CMG nodes, then Meta Storage nodes, then Data nodes. And in backward order for 
correct stop. Other scenarios are not tested and may lead to unexpected 
behaviour.

Let's describe all possible scenarios, expected behaviour for each of them and 
extend test coverage.

 

*UPD:* We want to check
 * {_}Startup scenarios{_}, when nodes start in different order to check grid 
assembles and operates correctly. Seems, it make sense when CMG node start 
first, because grid can't be assembled in otherwise. 
 * {_}Restart scenarios{_}, when stop-then-start node of different roles on 
various grid configurations, and check services degradation/restoration. In 
contrast to {_}startup scenarios{_}, some grid configuration work different, 
e.g. grid without CMG node.
 * _Stop scenarios_ are covered in "restart scenarios."


> Descibe nodes start/stop scenarios
> ----------------------------------
>
>                 Key: IGNITE-18171
>                 URL: https://issues.apache.org/jira/browse/IGNITE-18171
>             Project: Ignite
>          Issue Type: Improvement
>          Components: sql
>            Reporter: Andrey Mashenkov
>            Assignee: Andrey Mashenkov
>            Priority: Major
>              Labels: ignite-3
>
> h2. Definitions.
> We can distinguish next cluster node groups, see below. Each node may be part 
> of one or more groups.
>  * Cluster Management Group (CMG), that control new nodes join process.
>  * MetaStorage group (MSG), that hosts meta storage.
>  * Data node group (DNG), that just hosts tables partitions.
> The components (CMG, meta storage, tables components) are depends on each 
> other, but may resides on different (even disjoint) node subsets. So, some 
> components may become temporary unavailable, and dependant components must be 
> aware of such issues and handle them (wait, retry, throw exception or 
> whatever) in expected way, which has to be documented also.
> [See IEP for 
> details|https://cwiki.apache.org/confluence/display/IGNITE/IEP-77%3A+Node+Join+Protocol+and+Initialization+for+Ignite+3]
> h2. Motivation.
> As of now, the correct way to start the grid (after it was stopped) is: start 
> CMG nodes, then Meta Storage nodes, then Data nodes. And in backward order 
> for correct stop. Other scenarios are not tested and may lead to unexpected 
> behaviour.
> Let's describe all possible scenarios, expected behaviour for each of them 
> and extend test coverage.
>  
> h2. Results.
>  * {_}Startup scenarios{_}, when nodes start in different order to check grid 
> assembles and operates correctly. Seems, it make sense when CMG node start 
> first, because grid can't be assembled in otherwise. 
>  * {_}Restart scenarios{_}, when stop-then-start node of different roles on 
> various grid configurations, and check services degradation/restoration. In 
> contrast to {_}startup scenarios{_}, some grid configuration work different, 
> e.g. grid without CMG node.
>  * _Stop scenarios_ are covered in "restart scenarios."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to