[
https://issues.apache.org/jira/browse/IGNITE-24857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Denis Chudov updated IGNITE-24857:
----------------------------------
Epic Link: IGNITE-24900
> Possible race on assignments recovery when using Volatile Storage Profile
> -------------------------------------------------------------------------
>
> Key: IGNITE-24857
> URL: https://issues.apache.org/jira/browse/IGNITE-24857
> Project: Ignite
> Issue Type: Bug
> Reporter: Aleksandr Polovtsev
> Priority: Major
> Labels: ignite-3
>
> The following error is printed in logs when running the
> {{ItTableRaftSnapshotsTest#testDataRecoveryAfterSnapshot}} test with the
> {{VolatilePageMemoryStorageEngine}}:
> {code:java}
> java.util.concurrent.CompletionException: java.lang.AssertionError: The
> local node is outside of the replication group [, stable=Assignments
> [nodes=HashSet [Assignment [consistentId=itrst_tdras_3344, isPeer=true],
> Assignment [consistentId=itrst_tdras_3345, isPeer=true], Assignment
> [consistentId=itrst_tdras_3346, isPeer=true]], force=false,
> timestamp=114188891807809537, fromReset=false], pending=Assignments
> [nodes=HashSet [Assignment [consistentId=itrst_tdras_3344, isPeer=true],
> Assignment [consistentId=itrst_tdras_3345, isPeer=true], Assignment
> [consistentId=itrst_tdras_3346, isPeer=true]], force=false,
> timestamp=114188891807809537, fromReset=false], localName=itrst_tdras_3346].
> at
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315)
> [?:?]
> at
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320)
> [?:?]
> at
> java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire$$$capture(CompletableFuture.java:722)
> [?:?]
> at
> java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java)
> [?:?]
> at
> java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
> [?:?]
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> [?:?]
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> [?:?]
> at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
> Caused by: java.lang.AssertionError: The local node is outside of the
> replication group [, stable=Assignments [nodes=HashSet [Assignment
> [consistentId=itrst_tdras_3344, isPeer=true], Assignment
> [consistentId=itrst_tdras_3345, isPeer=true], Assignment
> [consistentId=itrst_tdras_3346, isPeer=true]], force=false,
> timestamp=114188891807809537, fromReset=false], pending=Assignments
> [nodes=HashSet [Assignment [consistentId=itrst_tdras_3344, isPeer=true],
> Assignment [consistentId=itrst_tdras_3345, isPeer=true], Assignment
> [consistentId=itrst_tdras_3346, isPeer=true]], force=false,
> timestamp=114188891807809537, fromReset=false], localName=itrst_tdras_3346].
> at
> org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$141(TableManager.java:2462)
> ~[main/:?]
> at
> org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:905)
> ~[main/:?]
> at
> org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$142(TableManager.java:2433)
> ~[main/:?]
> at
> java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire$$$capture(CompletableFuture.java:718)
> ~[?:?]
> ... 5 more
> {code}
> After a brief investigation, looks like there's may be some kind of a race
> between handling assignments from Meta Storage events and local assignments
> recovery, which leads to the node being present in stable assignments but not
> having started a corresponding replica.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)