[
https://issues.apache.org/jira/browse/IGNITE-21619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mirza Aliev updated IGNITE-21619:
---------------------------------
Epic Link: IGNITE-21389
> "Failed to get the primary replica" after massive data insert and node restart
> ------------------------------------------------------------------------------
>
> Key: IGNITE-21619
> URL: https://issues.apache.org/jira/browse/IGNITE-21619
> Project: Ignite
> Issue Type: Bug
> Components: sql
> Affects Versions: 3.0.0-beta2
> Reporter: Andrey Khitrin
> Priority: Major
> Labels: ignite-3, sql
> Attachments: ignite-config.conf, ignite3db-0.log
>
>
> Steps to reproduce:
> 1. Start a 1-node cluster.
> 2 Create several tables (5, for example) in aipersist zone.
> 3. Fill these tables with some data (1000 rows each, for example).
> 4. Verify that data is accessible via SQL.
> 5. Restart a node.
> 6. Try to fetch the same data again.
> Expected result: we could fetch data.
> Actual result: data is inaccessible.
> Trace on the client side:
> {code}
> java.sql.SQLException: Failed to get the primary replica
> [tablePartitionId=6_part_1]
> at
> org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57)
> at
> org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:154)
> at
> org.apache.ignite.internal.jdbc.JdbcStatement.executeQuery(JdbcStatement.java:111)
> ...
> {code}
> Trace in node log (attached):
> {code}
> 2024-02-28 12:36:34:807 +0500
> [INFO][%ClusterFailoverTest_cluster_0%sql-execution-pool-0][JdbcQueryEventHandlerImpl]
> Exception while executing query [query=select sum(k1) from failoverTest00]
> org.apache.ignite.sql.SqlException: IGN-CMN-65535
> TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f Failed to get the primary
> replica [tablePartitionId=6_part_1]
> at
> org.apache.ignite.internal.lang.SqlExceptionMapperUtil.mapToPublicSqlException(SqlExceptionMapperUtil.java:61)
> at
> org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.wrapIfNecessary(AsyncSqlCursorImpl.java:180)
> at
> org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.handleError(AsyncSqlCursorImpl.java:157)
> at
> org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.lambda$requestNextAsync$2(AsyncSqlCursorImpl.java:96)
> at
> java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
> at
> java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
> at
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
> at
> org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.lambda$execute$18(ExecutionServiceImpl.java:864)
> at
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
> at
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
> at
> java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
> at
> org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.lambda$execute$0(QueryTaskExecutorImpl.java:83)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.ignite.lang.IgniteException: IGN-CMN-65535
> TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f Failed to get the primary
> replica [tablePartitionId=6_part_1]
> at
> org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:117)
> at
> org.apache.ignite.internal.lang.SqlExceptionMapperUtil.mapToPublicSqlException(SqlExceptionMapperUtil.java:51)
> ... 15 more
> Caused by: org.apache.ignite.internal.lang.IgniteInternalException:
> IGN-PLACEMENTDRIVER-1 TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f Failed to
> get the primary replica [tablePartitionId=6_part_1]
> at
> org.apache.ignite.internal.util.ExceptionUtils.lambda$withCause$1(ExceptionUtils.java:384)
> at
> org.apache.ignite.internal.util.ExceptionUtils.withCauseInternal(ExceptionUtils.java:446)
> at
> org.apache.ignite.internal.util.ExceptionUtils.withCause(ExceptionUtils.java:384)
> at
> org.apache.ignite.internal.sql.engine.SqlQueryProcessor.lambda$primaryReplicas$2(SqlQueryProcessor.java:402)
> at
> java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
> at
> java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
> at
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
> at
> java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
> ... 3 more
> Caused by: java.util.concurrent.CompletionException:
> org.apache.ignite.internal.placementdriver.PrimaryReplicaAwaitTimeoutException:
> IGN-PLACEMENTDRIVER-1 TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f The
> primary replica await timed out [replicationGroupId=6_part_1,
> referenceTimestamp=HybridTimestamp [physical=2024-02-28 12:36:04:780 +0500,
> logical=0, composite=112007955400622080], currentLease=Lease
> [leaseholder=ClusterFailoverTest_cluster_0,
> leaseholderId=ee143400-ca69-401f-9ff8-6e1cc7e5b394, accepted=false,
> startTime=HybridTimestamp [physical=2024-02-28 12:36:04:048 +0500,
> logical=115, composite=112007955352649843], expirationTime=HybridTimestamp
> [physical=2024-02-28 12:38:04:048 +0500, logical=0,
> composite=112007963216969728], prolongable=false,
> replicationGroupId=6_part_1]]
> at
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
> at
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
> at
> java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:990)
> at
> java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)
> ... 9 more
> Caused by:
> org.apache.ignite.internal.placementdriver.PrimaryReplicaAwaitTimeoutException:
> IGN-PLACEMENTDRIVER-1 TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f The
> primary replica await timed out [replicationGroupId=6_part_1,
> referenceTimestamp=HybridTimestamp [physical=2024-02-28 12:36:04:780 +0500,
> logical=0, composite=112007955400622080], currentLease=Lease
> [leaseholder=ClusterFailoverTest_cluster_0,
> leaseholderId=ee143400-ca69-401f-9ff8-6e1cc7e5b394, accepted=false,
> startTime=HybridTimestamp [physical=2024-02-28 12:36:04:048 +0500,
> logical=115, composite=112007955352649843], expirationTime=HybridTimestamp
> [physical=2024-02-28 12:38:04:048 +0500, logical=0,
> composite=112007963216969728], prolongable=false,
> replicationGroupId=6_part_1]]
> at
> org.apache.ignite.internal.placementdriver.leases.LeaseTracker.lambda$awaitPrimaryReplica$5(LeaseTracker.java:276)
> at
> java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
> ... 10 more
> Caused by: java.util.concurrent.TimeoutException
> ... 7 more
> {code}
> Issue is *not* reproducible in the following configurations:
> * aipersist with 2 nodes
> * rocksdb with 1 or 2 nodes
--
This message was sent by Atlassian Jira
(v8.20.10#820010)