[
https://issues.apache.org/jira/browse/IGNITE-17959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Uttsel updated IGNITE-17959:
-----------------------------------
Description:
h2. *Motivation*
Frequently in long running tests can be observed (full msg in attach) :
{noformat}
Caused by:
org.apache.ignite.internal.replicator.exception.ReplicaUnavailableException:
IGN-REP-5 TraceId:82267e0a-aca2-47a3-806e-7922ed61d6d3 Replica is not ready
[replicationGroupId=b5b3a2e5-1342-4a90-97b3-a46e9509a1d6_part_5,
nodeName=iist_n_1]
{noformat}
check for example test: ItIndexSpoolTest.test, numerous runs or run until
failure will highlight the problem.
Additionally we can observe (err2.log attached):
{noformat}
2022-10-24 13:23:52:308 +0300
[WARNING][%iist_n_1%Raft-Group-Client-4][RaftGroupServiceImpl] Recoverable
error during the request type=ActionRequestImpl occurred (will be retried on
the randomly selected node):
java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException
at
java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
at
java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
at
java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019)
at
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at
java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
... 7 more
2022-10-24 13:24:13:437 +0300
[WARNING][%iist_n_1%Raft-Group-Client-2][RaftGroupServiceImpl] Recoverable
error during the request type=ActionRequestImpl occurred (will be retried on
the randomly selected node):
java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException
at
java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
at
java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
at
java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019)
at
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at
java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
... 7 more
{noformat}
h2. *Implementation Notes*
The root cause is that there is a race between a table creation and an
operation on the table. So it's possible that Replica isn't created in
ReplicaManager when rw/ro operation is invoked. It's need to retry the
operation several times if ReplicaUnavailableException is thrown.
was:
Frequently in long running tests can be observed (full msg in attach) :
{noformat}
Caused by:
org.apache.ignite.internal.replicator.exception.ReplicaUnavailableException:
IGN-REP-5 TraceId:82267e0a-aca2-47a3-806e-7922ed61d6d3 Replica is not ready
[replicationGroupId=b5b3a2e5-1342-4a90-97b3-a46e9509a1d6_part_5,
nodeName=iist_n_1]
{noformat}
check for example test: ItIndexSpoolTest.test, numerous runs or run until
failure will highlight the problem.
Additionally we can observe (err2.log attached):
{noformat}
2022-10-24 13:23:52:308 +0300
[WARNING][%iist_n_1%Raft-Group-Client-4][RaftGroupServiceImpl] Recoverable
error during the request type=ActionRequestImpl occurred (will be retried on
the randomly selected node):
java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException
at
java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
at
java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
at
java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019)
at
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at
java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
... 7 more
2022-10-24 13:24:13:437 +0300
[WARNING][%iist_n_1%Raft-Group-Client-2][RaftGroupServiceImpl] Recoverable
error during the request type=ActionRequestImpl occurred (will be retried on
the randomly selected node):
java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException
at
java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
at
java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
at
java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019)
at
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at
java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException
... 7 more
{noformat}
> ReplicaUnavailableException: Replica is not ready.
> --------------------------------------------------
>
> Key: IGNITE-17959
> URL: https://issues.apache.org/jira/browse/IGNITE-17959
> Project: Ignite
> Issue Type: Bug
> Components: general
> Affects Versions: 3.0.0-alpha5
> Reporter: Evgeny Stanilovsky
> Assignee: Sergey Uttsel
> Priority: Major
> Labels: ignite-3
> Attachments: err.log, err2.log
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> h2. *Motivation*
> Frequently in long running tests can be observed (full msg in attach) :
> {noformat}
> Caused by:
> org.apache.ignite.internal.replicator.exception.ReplicaUnavailableException:
> IGN-REP-5 TraceId:82267e0a-aca2-47a3-806e-7922ed61d6d3 Replica is not ready
> [replicationGroupId=b5b3a2e5-1342-4a90-97b3-a46e9509a1d6_part_5,
> nodeName=iist_n_1]
> {noformat}
> check for example test: ItIndexSpoolTest.test, numerous runs or run until
> failure will highlight the problem.
> Additionally we can observe (err2.log attached):
> {noformat}
> 2022-10-24 13:23:52:308 +0300
> [WARNING][%iist_n_1%Raft-Group-Client-4][RaftGroupServiceImpl] Recoverable
> error during the request type=ActionRequestImpl occurred (will be retried on
> the randomly selected node):
> java.util.concurrent.CompletionException:
> java.util.concurrent.TimeoutException
> at
> java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
> at
> java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
> at
> java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019)
> at
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
> at
> java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.util.concurrent.TimeoutException
> ... 7 more
> 2022-10-24 13:24:13:437 +0300
> [WARNING][%iist_n_1%Raft-Group-Client-2][RaftGroupServiceImpl] Recoverable
> error during the request type=ActionRequestImpl occurred (will be retried on
> the randomly selected node):
> java.util.concurrent.CompletionException:
> java.util.concurrent.TimeoutException
> at
> java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367)
> at
> java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376)
> at
> java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019)
> at
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> at
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
> at
> java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.util.concurrent.TimeoutException
> ... 7 more
> {noformat}
> h2. *Implementation Notes*
> The root cause is that there is a race between a table creation and an
> operation on the table. So it's possible that Replica isn't created in
> ReplicaManager when rw/ro operation is invoked. It's need to retry the
> operation several times if ReplicaUnavailableException is thrown.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)