[
https://issues.apache.org/jira/browse/IGNITE-26547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18029607#comment-18029607
]
Vladislav Pyatkov commented on IGNITE-26547:
--------------------------------------------
h2. Test Plan
h3. Unit Test
Individual tests for the CmgRaftService and MetaStorageServiceImpl services
without launching the entire cluster.
h4. CmgRaftService Tests — Majority Lost
# Deploy CMG on three nodes and verify that all API requests work correctly.
# Stop the majority (2 out of 3 nodes).
# All methods must return a CompletableFuture with a
SystemGroupUnavailableException (the message should clearly indicate that it
refers to the CMG).
# Start one node (the majority is restored).
# Verify that all API methods function correctly again.
h4. CmgRaftService Tests — All Nodes Unavailable
# Deploy CMG on three nodes and verify that all API requests work correctly.
# Stop all CMG nodes.
# All methods must return a CompletableFuture with a
SystemGroupUnavailableException (the message should clearly indicate that it
refers to the CMG).
# Bring all three nodes back online (CMG is restored).
# Verify that all API methods work correctly again.
h4. MetaStorageServiceImpl Tests — Majority Lost
# Deploy MG on three nodes and verify that all API requests work correctly.
# Stop the majority (2 out of 3 nodes).
# All methods must return a CompletableFuture with a
SystemGroupUnavailableException (the message should clearly indicate that it
refers to the MG).
# Start one node (the majority is restored).
# Verify that all API methods work correctly again.
h4. MetaStorageServiceImpl Tests — All Nodes Unavailable
# Deploy MG on three nodes and verify that all API requests work correctly.
# Stop all MG nodes.
# All methods must return a CompletableFuture with a
SystemGroupUnavailableException (the message should clearly indicate that it
refers to the MG).
# Bring all three nodes back online (MG is restored).
# Verify that all API methods work correctly again.
h3. Integration Testing
h4. RO Transactions (Implicit)
# Start three nodes on one MG.
# Create a group (specifying a data node filter to include only nodes 2 and
3), create a table, and perform a preload.
# Stop the MG node.
# Attempt to retrieve all records (SELECT * FROM table); a
SystemGroupUnavailableException should be thrown (the message should clearly
indicate that it refers to the MG).
h4. RW Transactions (Implicit)
# Start three nodes on one MG.
# Create a group (specifying a data node filter to include only nodes 2 and
3), create a table, and perform a preload.
# Stop the MG node.
# Attempt to insert an additional record (INSERT INTO table (id, val)); a
SystemGroupUnavailableException should be thrown (the message should clearly
indicate that it refers to the MG).
h4. RO Transactions (Explicit)
# Start three nodes on one MG.
# Create a group (specifying a data node filter to include only nodes 2 and
3), create a table, and perform a preload.
# Stop the MG node.
# Attempt to retrieve all records within an explicit transaction (SELECT *
FROM table); a SystemGroupUnavailableException should be thrown, followed by a
transaction rollback (the message should clearly indicate that it refers to the
MG).
h4. RW Transactions (Explicit)
# Start three nodes on one MG.
# Create a group (specifying a data node filter to include only nodes 2 and
3), create a table, and perform a preload.
# Stop the MG node.
# Attempt to insert an additional record within an explicit transaction
(INSERT INTO table (id, val)); a SystemGroupUnavailableException should be
thrown, followed by a transaction rollback (the message should clearly indicate
that it refers to the MG).
h4. Non-Transactional Calls via Public API — Retrieving Tables or a Specific
Table
# Start three nodes on one MG.
# Create a group, create a table, and perform a preload.
# Stop the MG node.
# Attempt to retrieve the table via the API
(ignite.table().table("table_name")); a SystemGroupUnavailableException should
be thrown (the message should clearly indicate that it refers to the MG).
h4. Non-Transactional Calls via Public API — Database Object Creation
# Start three nodes on one MG.
# Create a group, create a table, and perform a preload.
# Stop the MG node.
# Attempt to create a table (CREATE TABLE IF NOT EXISTS TEST(ID INT PRIMARY
KEY, NAME VARCHAR) ZONE TEST_ZONE); a SystemGroupUnavailableException should be
thrown.
h4. System Procedures Using CMG and/or MG — Adding a Node to the Topology
# Start three nodes on one CMG.
# Create a group, create a table, and perform a preload.
# Stop the CMG node.
# Attempt to start a fourth node (which was not previously part of the
cluster). The procedure hangs.
# Bring the node with CMG back online.
# All nodes join the topology. The cluster now has four nodes.
h4. System Procedures Using CMG and/or MG — Deferred Scale Up/Down Adjust
# Start three nodes on one MG.
# Create a group (specifying a data node filter and sufficiently large values
for data_nodes_auto_adjust_scale_up and data_nodes_auto_adjust_scale_down),
create a table, and perform a preload.
# Remove node 2 and add node 4 so that both nodes match the filter criteria.
# Verify that no adjust occurred (data nodes = \{2,3}).
# Stop the MG node.
# Wait long enough for both auto-adjust timers to expire.
# Bring the MG node back online.
# Wait sufficient time and verify that the auto-adjust has completed (data
nodes = \{3,4}).
h4. Cluster Restart
# Start three nodes on one MG.
# Create a group, create a table, and perform a preload.
# Query the data and verify that all preloaded records are available.
# Stop the cluster nodes in random order, introducing a delay of 1–5 seconds
between each shutdown.
# Start all cluster nodes again, introducing a delay of 1–5 seconds between
each startup.
# Repeat steps 3–5 several times.
> Extend test coverage for CMG/MG restarts
> ----------------------------------------
>
> Key: IGNITE-26547
> URL: https://issues.apache.org/jira/browse/IGNITE-26547
> Project: Ignite
> Issue Type: Improvement
> Reporter: Vladislav Pyatkov
> Assignee: Vladislav Pyatkov
> Priority: Major
> Labels: ignite-3
> Fix For: 3.2
>
>
> h3. Motivation
> Cases of unavailability of system replication groups (CMG, MG) can occur in
> the cluster involuntarily, for example, when nodes leave the topology, as
> well as intentionally when the topology is modified by a user with
> reassignment of system group nodes.
> Currently, the system behavior is undefined when any of these system groups
> are unavailable, and tests for this scenario are absent.
> h3. Definition of done
> A set of tests should be created to verify the system behavior under system
> group unavailability.
> Implementation will be carried out after the design and support are provided
> by the system.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)