Igor created IGNITE-22514:
-----------------------------
Summary: Failed to get the primary replica if non CMG node is down
Key: IGNITE-22514
URL: https://issues.apache.org/jira/browse/IGNITE-22514
Project: Ignite
Issue Type: Bug
Components: general, jdbc, networking, persistence
Affects Versions: 3.0.0-beta2
Environment: The 2 nodes cluster (1 CMG node).
Reporter: Igor
Attachments: CMG node.zip, non CMG killed node.zip
*Steps to reproduce:*
# Start cluster of 2 nodes with one CMG node.
# Create zone with replication equals to amount of nodes (2).
# Create 10 tables inside the zone.
# Insert 100 rows in every table.
# Await all tables*partitions*nodes local state is "HEALTHY"
# Await all tables*partitions*nodes global state is "AVAILABLE"
# Kill *non CMG* node with kill -9.
# Assert physical topology contains only 1 alive node.
# Assert logical topology contains only 1 alive node.
# Await all tables*partitions*nodes local state is "HEALTHY"
# Await all tables*partitions*nodes global state is "READ_ONLY".
# Execute select query using JDBC connecting to the *alive CMG* node.
*Expected:*
Data is returned.
*Actual:*
The exception on step 12 occurs:
{code:java}
Failed to get the primary replica [tablePartitionId=10_part_1]
java.sql.SQLException: Failed to get the primary replica
[tablePartitionId=10_part_1]
at
org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57)
at
org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:154)
at
org.apache.ignite.internal.jdbc.JdbcStatement.executeQuery(JdbcStatement.java:111)
at
org.gridgain.ai3tests.tests.teststeps.JdbcSteps.executeQuery(JdbcSteps.java:91)
at
org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.getActualResult(ClusterFailoverTestBase.java:338)
at
org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.assertDataIsFilledWithoutErrors(ClusterFailoverTestBase.java:169)
at
org.gridgain.ai3tests.tests.failover.ClusterFailover2NodesTest.singleKillAndCheckOtherNodeWorks(ClusterFailover2NodesTest.java:123)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834){code}
In the server logs continuous errors:
{code:java}
2024-06-14 18:10:58:719 +0000
[WARNING][%ClusterFailover2NodesTest_cluster_0%Raft-Group-Client-7][RaftGroupServiceImpl]
Recoverable error during the request occurred (will be retried on the randomly
selected node) [request=ReadIndexRequestImpl [entriesList=null,
groupId=28_part_1, peerId=ClusterFailover2NodesTest_cluster_1,
serverId=ClusterFailover2NodesTest_cluster_1], peer=Peer
[consistentId=ClusterFailover2NodesTest_cluster_1, idx=0], newPeer=Peer
[consistentId=ClusterFailover2NodesTest_cluster_1, idx=0]].
java.util.concurrent.CompletionException: java.net.ConnectException: Peer
ClusterFailover2NodesTest_cluster_1 is unavailable
at
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
at
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1099)
at
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2235)
at
org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:558)
at
org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$handleErrorResponse$44(RaftGroupServiceImpl.java:653)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.net.ConnectException: Peer ClusterFailover2NodesTest_cluster_1
is unavailable
at
org.apache.ignite.internal.raft.RaftGroupServiceImpl.resolvePeer(RaftGroupServiceImpl.java:806)
at
org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:557)
... 7 more{code}
Server logs are in the attachments.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)