[jira] [Created] (IGNITE-22514) Failed to get the primary replica if non CMG node is down

Igor (Jira) Fri, 14 Jun 2024 11:47:04 -0700

Igor created IGNITE-22514:
-----------------------------

             Summary: Failed to get the primary replica if non CMG node is down
                 Key: IGNITE-22514
                 URL: https://issues.apache.org/jira/browse/IGNITE-22514
             Project: Ignite
          Issue Type: Bug
          Components: general, jdbc, networking, persistence
    Affects Versions: 3.0.0-beta2
         Environment: The 2 nodes cluster (1 CMG node).
            Reporter: Igor
         Attachments: CMG node.zip, non CMG killed node.zip


*Steps to reproduce:*
 # Start cluster of 2 nodes with one CMG node.
 # Create zone with replication equals to amount of nodes (2).
 # Create 10 tables inside the zone.
 # Insert 100 rows in every table.
 # Await all tables*partitions*nodes local state is "HEALTHY"
 # Await all tables*partitions*nodes global state is "AVAILABLE"
 # Kill *non CMG* node with kill -9.
 # Assert physical topology contains only 1 alive node.
 # Assert logical topology contains only 1 alive node.
 # Await all tables*partitions*nodes local state is "HEALTHY"
 # Await all tables*partitions*nodes global state is "READ_ONLY".
 # Execute select query using JDBC connecting to the *alive CMG* node.

*Expected:*

Data is returned.

*Actual:*

The exception on step 12 occurs:
{code:java}
Failed to get the primary replica [tablePartitionId=10_part_1]
java.sql.SQLException: Failed to get the primary replica 
[tablePartitionId=10_part_1]
    at 
org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57)
    at 
org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:154)
    at 
org.apache.ignite.internal.jdbc.JdbcStatement.executeQuery(JdbcStatement.java:111)
    at 
org.gridgain.ai3tests.tests.teststeps.JdbcSteps.executeQuery(JdbcSteps.java:91)
    at 
org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.getActualResult(ClusterFailoverTestBase.java:338)
    at 
org.gridgain.ai3tests.tests.failover.ClusterFailoverTestBase.assertDataIsFilledWithoutErrors(ClusterFailoverTestBase.java:169)
    at 
org.gridgain.ai3tests.tests.failover.ClusterFailover2NodesTest.singleKillAndCheckOtherNodeWorks(ClusterFailover2NodesTest.java:123)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834){code}
In the server logs continuous errors:
{code:java}
2024-06-14 18:10:58:719 +0000 
[WARNING][%ClusterFailover2NodesTest_cluster_0%Raft-Group-Client-7][RaftGroupServiceImpl]
 Recoverable error during the request occurred (will be retried on the randomly 
selected node) [request=ReadIndexRequestImpl [entriesList=null, 
groupId=28_part_1, peerId=ClusterFailover2NodesTest_cluster_1, 
serverId=ClusterFailover2NodesTest_cluster_1], peer=Peer 
[consistentId=ClusterFailover2NodesTest_cluster_1, idx=0], newPeer=Peer 
[consistentId=ClusterFailover2NodesTest_cluster_1, idx=0]].
java.util.concurrent.CompletionException: java.net.ConnectException: Peer 
ClusterFailover2NodesTest_cluster_1 is unavailable
  at 
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
  at 
java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1099)
  at 
java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2235)
  at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:558)
  at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$handleErrorResponse$44(RaftGroupServiceImpl.java:653)
  at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
  at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
  at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
  at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
  at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.net.ConnectException: Peer ClusterFailover2NodesTest_cluster_1 
is unavailable
  at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.resolvePeer(RaftGroupServiceImpl.java:806)
  at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:557)
  ... 7 more{code}
Server logs are in the attachments.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-22514) Failed to get the primary replica if non CMG node is down

Reply via email to