Andrey Khitrin created IGNITE-21619:
---------------------------------------

             Summary: "Failed to get the primary replica" after massive data 
insert and node restart
                 Key: IGNITE-21619
                 URL: https://issues.apache.org/jira/browse/IGNITE-21619
             Project: Ignite
          Issue Type: Bug
          Components: sql
    Affects Versions: 3.0.0-beta2
            Reporter: Andrey Khitrin
         Attachments: ignite-config.conf, ignite3db-0.log

Steps to reproduce:

1. Start a 1-node cluster.
2 Create several tables (5, for example) in aipersist zone.
3. Fill these tables with some data (1000 rows each, for example).
4. Verify that data is accessible via SQL.
5. Restart a node.
6. Try to fetch the same data again.

Expected result: we could fetch data.

Actual result: data is inaccessible.

Trace on the client side:
{code}
java.sql.SQLException: Failed to get the primary replica 
[tablePartitionId=6_part_1]
        at 
org.apache.ignite.internal.jdbc.proto.IgniteQueryErrorCode.createJdbcSqlException(IgniteQueryErrorCode.java:57)
        at 
org.apache.ignite.internal.jdbc.JdbcStatement.execute0(JdbcStatement.java:154)
        at 
org.apache.ignite.internal.jdbc.JdbcStatement.executeQuery(JdbcStatement.java:111)
       ...
{code}

Trace in node log (attached):
{code}
2024-02-28 12:36:34:807 +0500 
[INFO][%ClusterFailoverTest_cluster_0%sql-execution-pool-0][JdbcQueryEventHandlerImpl]
 Exception while executing query [query=select sum(k1) from failoverTest00]
org.apache.ignite.sql.SqlException: IGN-CMN-65535 
TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f Failed to get the primary replica 
[tablePartitionId=6_part_1]
        at 
org.apache.ignite.internal.lang.SqlExceptionMapperUtil.mapToPublicSqlException(SqlExceptionMapperUtil.java:61)
        at 
org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.wrapIfNecessary(AsyncSqlCursorImpl.java:180)
        at 
org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.handleError(AsyncSqlCursorImpl.java:157)
        at 
org.apache.ignite.internal.sql.engine.AsyncSqlCursorImpl.lambda$requestNextAsync$2(AsyncSqlCursorImpl.java:96)
        at 
java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
        at 
java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
        at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
        at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
        at 
org.apache.ignite.internal.sql.engine.exec.ExecutionServiceImpl$DistributedQueryManager.lambda$execute$18(ExecutionServiceImpl.java:864)
        at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
        at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
        at 
java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
        at 
org.apache.ignite.internal.sql.engine.exec.QueryTaskExecutorImpl.lambda$execute$0(QueryTaskExecutorImpl.java:83)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.ignite.lang.IgniteException: IGN-CMN-65535 
TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f Failed to get the primary replica 
[tablePartitionId=6_part_1]
        at 
org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:117)
        at 
org.apache.ignite.internal.lang.SqlExceptionMapperUtil.mapToPublicSqlException(SqlExceptionMapperUtil.java:51)
        ... 15 more
Caused by: org.apache.ignite.internal.lang.IgniteInternalException: 
IGN-PLACEMENTDRIVER-1 TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f Failed to 
get the primary replica [tablePartitionId=6_part_1]
        at 
org.apache.ignite.internal.util.ExceptionUtils.lambda$withCause$1(ExceptionUtils.java:384)
        at 
org.apache.ignite.internal.util.ExceptionUtils.withCauseInternal(ExceptionUtils.java:446)
        at 
org.apache.ignite.internal.util.ExceptionUtils.withCause(ExceptionUtils.java:384)
        at 
org.apache.ignite.internal.sql.engine.SqlQueryProcessor.lambda$primaryReplicas$2(SqlQueryProcessor.java:402)
        at 
java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
        at 
java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
        at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
        at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
        at 
java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        ... 3 more
Caused by: java.util.concurrent.CompletionException: 
org.apache.ignite.internal.placementdriver.PrimaryReplicaAwaitTimeoutException: 
IGN-PLACEMENTDRIVER-1 TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f The primary 
replica await timed out [replicationGroupId=6_part_1, 
referenceTimestamp=HybridTimestamp [physical=2024-02-28 12:36:04:780 +0500, 
logical=0, composite=112007955400622080], currentLease=Lease 
[leaseholder=ClusterFailoverTest_cluster_0, 
leaseholderId=ee143400-ca69-401f-9ff8-6e1cc7e5b394, accepted=false, 
startTime=HybridTimestamp [physical=2024-02-28 12:36:04:048 +0500, logical=115, 
composite=112007955352649843], expirationTime=HybridTimestamp 
[physical=2024-02-28 12:38:04:048 +0500, logical=0, 
composite=112007963216969728], prolongable=false, replicationGroupId=6_part_1]]
        at 
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
        at 
java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
        at 
java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:990)
        at 
java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)
        ... 9 more
Caused by: 
org.apache.ignite.internal.placementdriver.PrimaryReplicaAwaitTimeoutException: 
IGN-PLACEMENTDRIVER-1 TraceId:8d366905-a4bb-4333-b0b3-c647a1cf943f The primary 
replica await timed out [replicationGroupId=6_part_1, 
referenceTimestamp=HybridTimestamp [physical=2024-02-28 12:36:04:780 +0500, 
logical=0, composite=112007955400622080], currentLease=Lease 
[leaseholder=ClusterFailoverTest_cluster_0, 
leaseholderId=ee143400-ca69-401f-9ff8-6e1cc7e5b394, accepted=false, 
startTime=HybridTimestamp [physical=2024-02-28 12:36:04:048 +0500, logical=115, 
composite=112007955352649843], expirationTime=HybridTimestamp 
[physical=2024-02-28 12:38:04:048 +0500, logical=0, 
composite=112007963216969728], prolongable=false, replicationGroupId=6_part_1]]
        at 
org.apache.ignite.internal.placementdriver.leases.LeaseTracker.lambda$awaitPrimaryReplica$5(LeaseTracker.java:276)
        at 
java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
        ... 10 more
Caused by: java.util.concurrent.TimeoutException
        ... 7 more
{code}

Issue is *not* reproducible in the following configurations:
* aipersist with 2 nodes
* rocksdb with 1 or 2 nodes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to