Andrey Khitrin created IGNITE-21619: ---------------------------------------
Summary: "Failed to get the primary replica" after massive data insert and node restart Key: IGNITE-21619 URL: https://issues.apache.org/jira/browse/IGNITE-21619 Project: Ignite Issue Type: Bug Components: sql Affects Versions: 3.0.0-beta2 Reporter: Andrey Khitrin Attachments: ignite-config.conf, ignite3db-0.log Steps to reproduce: 1. Start a 1-node cluster. 2 Create several tables (5, for example) in aipersist zone. 3. Fill these tables with some data (1000 rows each, for example). 4. Verify that data is accessible via SQL. 5. Restart a node. 6. Try to fetch the same data again. Expected result: we could fetch data. Actual result: data is inaccessible. Trace on the client side: {code} java.sql.SQLException: Failed to get the primary replica [tablePartitionId=6_part_1] at org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57) at org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:154) at org.apache.ignite.internal.jdbc.JdbcStatement.executeQuery(JdbcStatement.java:111) ... {code} Trace in node log (attached): {code} 2024-02-28 12:36:34:807 +0500 [INFO][%ClusterFailoverTest_cluster_0%sql-execution-pool-0][JdbcQueryEventHandlerImpl] Exception while executing query [query=select sum(k1) from failoverTest00] org.apache.ignite.sql.SqlException: IGN-CMN-65535 TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f Failed to get the primary replica [tablePartitionId=6_part_1] at org.apache.ignite.internal.lang.SqlExceptionMapperUtil.mapToPublicSqlException(SqlExceptionMapperUtil.java:61) at org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.wrapIfNecessary(AsyncSqlCursorImpl.java:180) at org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.handleError(AsyncSqlCursorImpl.java:157) at org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.lambda$requestNextAsync$2(AsyncSqlCursorImpl.java:96) at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930) at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) at org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.lambda$execute$18(ExecutionServiceImpl.java:864) at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) at org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.lambda$execute$0(QueryTaskExecutorImpl.java:83) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f Failed to get the primary replica [tablePartitionId=6_part_1] at org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:117) at org.apache.ignite.internal.lang.SqlExceptionMapperUtil.mapToPublicSqlException(SqlExceptionMapperUtil.java:51) ... 15 more Caused by: org.apache.ignite.internal.lang.IgniteInternalException: IGN-PLACEMENTDRIVER-1 TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f Failed to get the primary replica [tablePartitionId=6_part_1] at org.apache.ignite.internal.util.ExceptionUtils.lambda$withCause$1(ExceptionUtils.java:384) at org.apache.ignite.internal.util.ExceptionUtils.withCauseInternal(ExceptionUtils.java:446) at org.apache.ignite.internal.util.ExceptionUtils.withCause(ExceptionUtils.java:384) at org.apache.ignite.internal.sql.engine.SqlQueryProcessor.lambda$primaryReplicas$2(SqlQueryProcessor.java:402) at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930) at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) at java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ... 3 more Caused by: java.util.concurrent.CompletionException: org.apache.ignite.internal.placementdriver.PrimaryReplicaAwaitTimeoutException: IGN-PLACEMENTDRIVER-1 TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f The primary replica await timed out [replicationGroupId=6_part_1, referenceTimestamp=HybridTimestamp [physical=2024-02-28 12:36:04:780 +0500, logical=0, composite=112007955400622080], currentLease=Lease [leaseholder=ClusterFailoverTest_cluster_0, leaseholderId=ee143400-ca69-401f-9ff8-6e1cc7e5b394, accepted=false, startTime=HybridTimestamp [physical=2024-02-28 12:36:04:048 +0500, logical=115, composite=112007955352649843], expirationTime=HybridTimestamp [physical=2024-02-28 12:38:04:048 +0500, logical=0, composite=112007963216969728], prolongable=false, replicationGroupId=6_part_1]] at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:990) at java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970) ... 9 more Caused by: org.apache.ignite.internal.placementdriver.PrimaryReplicaAwaitTimeoutException: IGN-PLACEMENTDRIVER-1 TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f The primary replica await timed out [replicationGroupId=6_part_1, referenceTimestamp=HybridTimestamp [physical=2024-02-28 12:36:04:780 +0500, logical=0, composite=112007955400622080], currentLease=Lease [leaseholder=ClusterFailoverTest_cluster_0, leaseholderId=ee143400-ca69-401f-9ff8-6e1cc7e5b394, accepted=false, startTime=HybridTimestamp [physical=2024-02-28 12:36:04:048 +0500, logical=115, composite=112007955352649843], expirationTime=HybridTimestamp [physical=2024-02-28 12:38:04:048 +0500, logical=0, composite=112007963216969728], prolongable=false, replicationGroupId=6_part_1]] at org.apache.ignite.internal.placementdriver.leases.LeaseTracker.lambda$awaitPrimaryReplica$5(LeaseTracker.java:276) at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) ... 10 more Caused by: java.util.concurrent.TimeoutException ... 7 more {code} Issue is *not* reproducible in the following configurations: * aipersist with 2 nodes * rocksdb with 1 or 2 nodes -- This message was sent by Atlassian Jira (v8.20.10#820010)