[jira] [Updated] (IGNITE-20410) ItSchemaSyncAndReplicationTest#laggingSchemasPreventPartitionDataReplication fails with ReplicationTimeoutException

Alexander Lapin (Jira) Thu, 14 Sep 2023 00:57:04 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-20410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexander Lapin updated IGNITE-20410:
-------------------------------------
    Description: 
ItSchemaSyncAndReplicationTest#laggingSchemasPreventPartitionDataReplication 
fails with ReplicationTimeoutException because of Metadata unavailability on 
replication node.

With some extra debug output I got following result:
{code:java}
applyUpdateCommand
applyCmdWithExceptionHandling messageType = 43, groupType = 9
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
available, group '1_part_0', required level 3; rejecting ActionRequest with 
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
available, group '1_part_0', required level 3; rejecting ActionRequest with 
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
available, group '1_part_0', required level 3; rejecting ActionRequest with 
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
available, group '1_part_0', required level 3; rejecting ActionRequest with 
EBUSY., leaderId=null], err = null
...
...
...
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
available, group '1_part_0', required level 3; rejecting ActionRequest with 
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
available, group '1_part_0', required level 3; rejecting ActionRequest with 
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
available, group '1_part_0', required level 3; rejecting ActionRequest with 
EBUSY., leaderId=null], err = null
applyCmdWithExceptionHandling RuntimeException

Replication is timed out [replicaGrpId=1_part_0]
org.apache.ignite.tx.TransactionException: IGN-REP-3 
TraceId:b5ea497a-7985-414a-8b44-8a880704bf80 Replication is timed out 
[replicaGrpId=1_part_0]
        at 
app//org.apache.ignite.internal.util.ExceptionUtils.lambda$withCause$0(ExceptionUtils.java:378)
        at 
app//org.apache.ignite.internal.util.ExceptionUtils.withCauseInternal(ExceptionUtils.java:461)
        at 
app//org.apache.ignite.internal.util.ExceptionUtils.withCause(ExceptionUtils.java:378)
        at 
app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.wrapReplicationException(InternalTableImpl.java:1630)
        at 
app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$postEnlist$9(InternalTableImpl.java:521)
        at 
[email protected]/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
        at 
[email protected]/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
        at 
[email protected]/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at 
[email protected]/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
        at 
app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$enlistWithRetry$5(InternalTableImpl.java:495)
        at 
[email protected]/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
        at 
[email protected]/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
        at 
[email protected]/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at 
[email protected]/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
        at 
app//org.apache.ignite.internal.replicator.ReplicaService.lambda$sendToReplica$3(ReplicaService.java:97)
        at 
[email protected]/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
        at 
[email protected]/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
        at 
[email protected]/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483)
        at 
[email protected]/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
        at 
[email protected]/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
        at 
[email protected]/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
        at 
[email protected]/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
        at 
[email protected]/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
Caused by: 
org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException: 
IGN-REP-3 TraceId:b5ea497a-7985-414a-8b44-8a880704bf80 Replication is timed out 
[replicaGrpId=1_part_0]
        ... 9 more {code}
The problem occurs when test invariant is broken, meaning that despite the
transferLeadershipsTo(notInhibitedNodeIndex);
the leader re-elected to {{{}nodeToInhibitMetaStorage{}}}, meaning that we do 
expect node_0 (not inhibited) as leader, however the leader is node_1 
(inhibited).
Node to inhibit issart_lsppdr_1
Inhibit start 1694674876362
Before put
!!! Before wait
!!! After wait
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
available, group '1_part_0', required level 3, available level 2; rejecting 
ActionRequest with EBUSY node = issart_lsppdr_1, leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
available, group '1_part_0', required level 3, available level 2; rejecting 
ActionRequest with EBUSY node = issart_lsppdr_1, leaderId=null], err = null
Please pay attention to the *node = issart_lsppdr_1* in complete resp = 
ErrorResponseImpl [errorCode=1009 it's the same as
Node to inhibit issart_lsppdr_1

  was:
ItSchemaSyncAndReplicationTest#laggingSchemasPreventPartitionDataReplication 
fails with ReplicationTimeoutException because of Metadata unavailability on 
replication node.

With some extra debug output I got following result:
{code:java}
applyUpdateCommand
applyCmdWithExceptionHandling messageType = 43, groupType = 9
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
available, group '1_part_0', required level 3; rejecting ActionRequest with 
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
available, group '1_part_0', required level 3; rejecting ActionRequest with 
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
available, group '1_part_0', required level 3; rejecting ActionRequest with 
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
available, group '1_part_0', required level 3; rejecting ActionRequest with 
EBUSY., leaderId=null], err = null
...
...
...
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
available, group '1_part_0', required level 3; rejecting ActionRequest with 
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
available, group '1_part_0', required level 3; rejecting ActionRequest with 
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
available, group '1_part_0', required level 3; rejecting ActionRequest with 
EBUSY., leaderId=null], err = null
applyCmdWithExceptionHandling RuntimeException

Replication is timed out [replicaGrpId=1_part_0]
org.apache.ignite.tx.TransactionException: IGN-REP-3 
TraceId:b5ea497a-7985-414a-8b44-8a880704bf80 Replication is timed out 
[replicaGrpId=1_part_0]
        at 
app//org.apache.ignite.internal.util.ExceptionUtils.lambda$withCause$0(ExceptionUtils.java:378)
        at 
app//org.apache.ignite.internal.util.ExceptionUtils.withCauseInternal(ExceptionUtils.java:461)
        at 
app//org.apache.ignite.internal.util.ExceptionUtils.withCause(ExceptionUtils.java:378)
        at 
app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.wrapReplicationException(InternalTableImpl.java:1630)
        at 
app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$postEnlist$9(InternalTableImpl.java:521)
        at 
[email protected]/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
        at 
[email protected]/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
        at 
[email protected]/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at 
[email protected]/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
        at 
app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$enlistWithRetry$5(InternalTableImpl.java:495)
        at 
[email protected]/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
        at 
[email protected]/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
        at 
[email protected]/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at 
[email protected]/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
        at 
app//org.apache.ignite.internal.replicator.ReplicaService.lambda$sendToReplica$3(ReplicaService.java:97)
        at 
[email protected]/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
        at 
[email protected]/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
        at 
[email protected]/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483)
        at 
[email protected]/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
        at 
[email protected]/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
        at 
[email protected]/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
        at 
[email protected]/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
        at 
[email protected]/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
Caused by: 
org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException: 
IGN-REP-3 TraceId:b5ea497a-7985-414a-8b44-8a880704bf80 Replication is timed out 
[replicaGrpId=1_part_0]
        ... 9 more {code}


> ItSchemaSyncAndReplicationTest#laggingSchemasPreventPartitionDataReplication 
> fails with ReplicationTimeoutException
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-20410
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20410
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexander Lapin
>            Priority: Major
>
> ItSchemaSyncAndReplicationTest#laggingSchemasPreventPartitionDataReplication 
> fails with ReplicationTimeoutException because of Metadata unavailability on 
> replication node.
> With some extra debug output I got following result:
> {code:java}
> applyUpdateCommand
> applyCmdWithExceptionHandling messageType = 43, groupType = 9
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
> available, group '1_part_0', required level 3; rejecting ActionRequest with 
> EBUSY., leaderId=null], err = null
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
> available, group '1_part_0', required level 3; rejecting ActionRequest with 
> EBUSY., leaderId=null], err = null
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
> available, group '1_part_0', required level 3; rejecting ActionRequest with 
> EBUSY., leaderId=null], err = null
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
> available, group '1_part_0', required level 3; rejecting ActionRequest with 
> EBUSY., leaderId=null], err = null
> ...
> ...
> ...
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
> available, group '1_part_0', required level 3; rejecting ActionRequest with 
> EBUSY., leaderId=null], err = null
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
> available, group '1_part_0', required level 3; rejecting ActionRequest with 
> EBUSY., leaderId=null], err = null
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
> available, group '1_part_0', required level 3; rejecting ActionRequest with 
> EBUSY., leaderId=null], err = null
> applyCmdWithExceptionHandling RuntimeException
> Replication is timed out [replicaGrpId=1_part_0]
> org.apache.ignite.tx.TransactionException: IGN-REP-3 
> TraceId:b5ea497a-7985-414a-8b44-8a880704bf80 Replication is timed out 
> [replicaGrpId=1_part_0]
>       at 
> app//org.apache.ignite.internal.util.ExceptionUtils.lambda$withCause$0(ExceptionUtils.java:378)
>       at 
> app//org.apache.ignite.internal.util.ExceptionUtils.withCauseInternal(ExceptionUtils.java:461)
>       at 
> app//org.apache.ignite.internal.util.ExceptionUtils.withCause(ExceptionUtils.java:378)
>       at 
> app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.wrapReplicationException(InternalTableImpl.java:1630)
>       at 
> app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$postEnlist$9(InternalTableImpl.java:521)
>       at 
> [email protected]/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
>       at 
> [email protected]/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
>       at 
> [email protected]/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
>       at 
> [email protected]/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
>       at 
> app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$enlistWithRetry$5(InternalTableImpl.java:495)
>       at 
> [email protected]/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
>       at 
> [email protected]/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
>       at 
> [email protected]/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
>       at 
> [email protected]/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
>       at 
> app//org.apache.ignite.internal.replicator.ReplicaService.lambda$sendToReplica$3(ReplicaService.java:97)
>       at 
> [email protected]/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
>       at 
> [email protected]/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
>       at 
> [email protected]/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483)
>       at 
> [email protected]/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
>       at 
> [email protected]/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
>       at 
> [email protected]/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
>       at 
> [email protected]/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
>       at 
> [email protected]/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
> Caused by: 
> org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException: 
> IGN-REP-3 TraceId:b5ea497a-7985-414a-8b44-8a880704bf80 Replication is timed 
> out [replicaGrpId=1_part_0]
>       ... 9 more {code}
> The problem occurs when test invariant is broken, meaning that despite the
> transferLeadershipsTo(notInhibitedNodeIndex);
> the leader re-elected to {{{}nodeToInhibitMetaStorage{}}}, meaning that we do 
> expect node_0 (not inhibited) as leader, however the leader is node_1 
> (inhibited).
> Node to inhibit issart_lsppdr_1
> Inhibit start 1694674876362
> Before put
> !!! Before wait
> !!! After wait
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
> available, group '1_part_0', required level 3, available level 2; rejecting 
> ActionRequest with EBUSY node = issart_lsppdr_1, leaderId=null], err = null
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet 
> available, group '1_part_0', required level 3, available level 2; rejecting 
> ActionRequest with EBUSY node = issart_lsppdr_1, leaderId=null], err = null
> Please pay attention to the *node = issart_lsppdr_1* in complete resp = 
> ErrorResponseImpl [errorCode=1009 it's the same as
> Node to inhibit issart_lsppdr_1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20410) ItSchemaSyncAndReplicationTest#laggingSchemasPreventPartitionDataReplication fails with ReplicationTimeoutException

Reply via email to