[
https://issues.apache.org/jira/browse/IGNITE-20410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexander Lapin updated IGNITE-20410:
-------------------------------------
Description:
ItSchemaSyncAndReplicationTest#laggingSchemasPreventPartitionDataReplication
fails with ReplicationTimeoutException because of Metadata unavailability on
replication node.
With some extra debug output I got following result:
{code:java}
applyUpdateCommand
applyCmdWithExceptionHandling messageType = 43, groupType = 9
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
available, group '1_part_0', required level 3; rejecting ActionRequest with
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
available, group '1_part_0', required level 3; rejecting ActionRequest with
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
available, group '1_part_0', required level 3; rejecting ActionRequest with
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
available, group '1_part_0', required level 3; rejecting ActionRequest with
EBUSY., leaderId=null], err = null
...
...
...
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
available, group '1_part_0', required level 3; rejecting ActionRequest with
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
available, group '1_part_0', required level 3; rejecting ActionRequest with
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
available, group '1_part_0', required level 3; rejecting ActionRequest with
EBUSY., leaderId=null], err = null
applyCmdWithExceptionHandling RuntimeException
Replication is timed out [replicaGrpId=1_part_0]
org.apache.ignite.tx.TransactionException: IGN-REP-3
TraceId:b5ea497a-7985-414a-8b44-8a880704bf80 Replication is timed out
[replicaGrpId=1_part_0]
at
app//org.apache.ignite.internal.util.ExceptionUtils.lambda$withCause$0(ExceptionUtils.java:378)
at
app//org.apache.ignite.internal.util.ExceptionUtils.withCauseInternal(ExceptionUtils.java:461)
at
app//org.apache.ignite.internal.util.ExceptionUtils.withCause(ExceptionUtils.java:378)
at
app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.wrapReplicationException(InternalTableImpl.java:1630)
at
app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$postEnlist$9(InternalTableImpl.java:521)
at
[email protected]/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
at
[email protected]/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
at
[email protected]/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at
[email protected]/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
at
app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$enlistWithRetry$5(InternalTableImpl.java:495)
at
[email protected]/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
at
[email protected]/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
at
[email protected]/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at
[email protected]/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
at
app//org.apache.ignite.internal.replicator.ReplicaService.lambda$sendToReplica$3(ReplicaService.java:97)
at
[email protected]/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
at
[email protected]/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
at
[email protected]/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483)
at
[email protected]/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
at
[email protected]/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
at
[email protected]/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
at
[email protected]/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
at
[email protected]/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
Caused by:
org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException:
IGN-REP-3 TraceId:b5ea497a-7985-414a-8b44-8a880704bf80 Replication is timed out
[replicaGrpId=1_part_0]
... 9 more {code}
The problem occurs when test invariant is broken, meaning that despite the
transferLeadershipsTo(notInhibitedNodeIndex);
the leader re-elected to {{{}nodeToInhibitMetaStorage{}}}, meaning that we do
expect node_0 (not inhibited) as leader, however the leader is node_1
(inhibited).
Node to inhibit issart_lsppdr_1
Inhibit start 1694674876362
Before put
!!! Before wait
!!! After wait
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
available, group '1_part_0', required level 3, available level 2; rejecting
ActionRequest with EBUSY node = issart_lsppdr_1, leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
available, group '1_part_0', required level 3, available level 2; rejecting
ActionRequest with EBUSY node = issart_lsppdr_1, leaderId=null], err = null
Please pay attention to the *node = issart_lsppdr_1* in complete resp =
ErrorResponseImpl [errorCode=1009 it's the same as
Node to inhibit issart_lsppdr_1
was:
ItSchemaSyncAndReplicationTest#laggingSchemasPreventPartitionDataReplication
fails with ReplicationTimeoutException because of Metadata unavailability on
replication node.
With some extra debug output I got following result:
{code:java}
applyUpdateCommand
applyCmdWithExceptionHandling messageType = 43, groupType = 9
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
available, group '1_part_0', required level 3; rejecting ActionRequest with
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
available, group '1_part_0', required level 3; rejecting ActionRequest with
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
available, group '1_part_0', required level 3; rejecting ActionRequest with
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
available, group '1_part_0', required level 3; rejecting ActionRequest with
EBUSY., leaderId=null], err = null
...
...
...
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
available, group '1_part_0', required level 3; rejecting ActionRequest with
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
available, group '1_part_0', required level 3; rejecting ActionRequest with
EBUSY., leaderId=null], err = null
complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
available, group '1_part_0', required level 3; rejecting ActionRequest with
EBUSY., leaderId=null], err = null
applyCmdWithExceptionHandling RuntimeException
Replication is timed out [replicaGrpId=1_part_0]
org.apache.ignite.tx.TransactionException: IGN-REP-3
TraceId:b5ea497a-7985-414a-8b44-8a880704bf80 Replication is timed out
[replicaGrpId=1_part_0]
at
app//org.apache.ignite.internal.util.ExceptionUtils.lambda$withCause$0(ExceptionUtils.java:378)
at
app//org.apache.ignite.internal.util.ExceptionUtils.withCauseInternal(ExceptionUtils.java:461)
at
app//org.apache.ignite.internal.util.ExceptionUtils.withCause(ExceptionUtils.java:378)
at
app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.wrapReplicationException(InternalTableImpl.java:1630)
at
app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$postEnlist$9(InternalTableImpl.java:521)
at
[email protected]/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
at
[email protected]/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
at
[email protected]/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at
[email protected]/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
at
app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$enlistWithRetry$5(InternalTableImpl.java:495)
at
[email protected]/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
at
[email protected]/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
at
[email protected]/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at
[email protected]/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
at
app//org.apache.ignite.internal.replicator.ReplicaService.lambda$sendToReplica$3(ReplicaService.java:97)
at
[email protected]/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
at
[email protected]/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
at
[email protected]/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483)
at
[email protected]/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
at
[email protected]/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
at
[email protected]/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
at
[email protected]/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
at
[email protected]/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
Caused by:
org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException:
IGN-REP-3 TraceId:b5ea497a-7985-414a-8b44-8a880704bf80 Replication is timed out
[replicaGrpId=1_part_0]
... 9 more {code}
> ItSchemaSyncAndReplicationTest#laggingSchemasPreventPartitionDataReplication
> fails with ReplicationTimeoutException
> -------------------------------------------------------------------------------------------------------------------
>
> Key: IGNITE-20410
> URL: https://issues.apache.org/jira/browse/IGNITE-20410
> Project: Ignite
> Issue Type: Bug
> Reporter: Alexander Lapin
> Priority: Major
>
> ItSchemaSyncAndReplicationTest#laggingSchemasPreventPartitionDataReplication
> fails with ReplicationTimeoutException because of Metadata unavailability on
> replication node.
> With some extra debug output I got following result:
> {code:java}
> applyUpdateCommand
> applyCmdWithExceptionHandling messageType = 43, groupType = 9
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
> available, group '1_part_0', required level 3; rejecting ActionRequest with
> EBUSY., leaderId=null], err = null
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
> available, group '1_part_0', required level 3; rejecting ActionRequest with
> EBUSY., leaderId=null], err = null
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
> available, group '1_part_0', required level 3; rejecting ActionRequest with
> EBUSY., leaderId=null], err = null
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
> available, group '1_part_0', required level 3; rejecting ActionRequest with
> EBUSY., leaderId=null], err = null
> ...
> ...
> ...
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
> available, group '1_part_0', required level 3; rejecting ActionRequest with
> EBUSY., leaderId=null], err = null
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
> available, group '1_part_0', required level 3; rejecting ActionRequest with
> EBUSY., leaderId=null], err = null
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
> available, group '1_part_0', required level 3; rejecting ActionRequest with
> EBUSY., leaderId=null], err = null
> applyCmdWithExceptionHandling RuntimeException
> Replication is timed out [replicaGrpId=1_part_0]
> org.apache.ignite.tx.TransactionException: IGN-REP-3
> TraceId:b5ea497a-7985-414a-8b44-8a880704bf80 Replication is timed out
> [replicaGrpId=1_part_0]
> at
> app//org.apache.ignite.internal.util.ExceptionUtils.lambda$withCause$0(ExceptionUtils.java:378)
> at
> app//org.apache.ignite.internal.util.ExceptionUtils.withCauseInternal(ExceptionUtils.java:461)
> at
> app//org.apache.ignite.internal.util.ExceptionUtils.withCause(ExceptionUtils.java:378)
> at
> app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.wrapReplicationException(InternalTableImpl.java:1630)
> at
> app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$postEnlist$9(InternalTableImpl.java:521)
> at
> [email protected]/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
> at
> [email protected]/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
> at
> [email protected]/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> at
> [email protected]/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
> at
> app//org.apache.ignite.internal.table.distributed.storage.InternalTableImpl.lambda$enlistWithRetry$5(InternalTableImpl.java:495)
> at
> [email protected]/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
> at
> [email protected]/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
> at
> [email protected]/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> at
> [email protected]/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
> at
> app//org.apache.ignite.internal.replicator.ReplicaService.lambda$sendToReplica$3(ReplicaService.java:97)
> at
> [email protected]/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> at
> [email protected]/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> at
> [email protected]/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483)
> at
> [email protected]/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
> at
> [email protected]/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
> at
> [email protected]/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
> at
> [email protected]/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
> at
> [email protected]/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
> Caused by:
> org.apache.ignite.internal.replicator.exception.ReplicationTimeoutException:
> IGN-REP-3 TraceId:b5ea497a-7985-414a-8b44-8a880704bf80 Replication is timed
> out [replicaGrpId=1_part_0]
> ... 9 more {code}
> The problem occurs when test invariant is broken, meaning that despite the
> transferLeadershipsTo(notInhibitedNodeIndex);
> the leader re-elected to {{{}nodeToInhibitMetaStorage{}}}, meaning that we do
> expect node_0 (not inhibited) as leader, however the leader is node_1
> (inhibited).
> Node to inhibit issart_lsppdr_1
> Inhibit start 1694674876362
> Before put
> !!! Before wait
> !!! After wait
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
> available, group '1_part_0', required level 3, available level 2; rejecting
> ActionRequest with EBUSY node = issart_lsppdr_1, leaderId=null], err = null
> complete resp = ErrorResponseImpl [errorCode=1009, errorMsg=Metadata not yet
> available, group '1_part_0', required level 3, available level 2; rejecting
> ActionRequest with EBUSY node = issart_lsppdr_1, leaderId=null], err = null
> Please pay attention to the *node = issart_lsppdr_1* in complete resp =
> ErrorResponseImpl [errorCode=1009 it's the same as
> Node to inhibit issart_lsppdr_1
--
This message was sent by Atlassian Jira
(v8.20.10#820010)