[ 
https://issues.apache.org/jira/browse/IGNITE-20640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladislav Pyatkov updated IGNITE-20640:
---------------------------------------
    Description: 
This behavior leads to getting stuck in any RAFT operation because the leader 
cannot be elected. This issue is reproduced in the test 
ItDataSchemaSyncTest#checkSchemasCorrectlyRestore, to test it in a log just add 
an assertion:

{code:title='Loza#startRaftGroupNodeInternal'}

assert configuration.peers().contains(nodeId.peer()) || configuration.learners()
                .contains(nodeId.peer()) : "Raft node started on a peer where 
it should not be";

{code}
{noformat}
[2023-10-10T20:51:51,154][ERROR][%node0%tableManager-io-11][WatchProcessor] 
Error occurred when processing a watch event
 java.lang.AssertionError: Raft node started on a peer where it should not be
    at 
org.apache.ignite.internal.raft.Loza.startRaftGroupNodeInternal(Loza.java:361) 
~[main/:?]
    at org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:252) 
~[main/:?]
    at org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:225) 
~[main/:?]
    at 
org.apache.ignite.internal.table.distributed.TableManager.startPartitionRaftGroupNode(TableManager.java:1986)
 ~[main/:?]
    at 
org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$90(TableManager.java:1878)
 ~[main/:?]
    at 
org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:805) 
~[main/:?]
    at 
org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$91(TableManager.java:1848)
 ~[main/:?]
    at 
java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783)
 [?:?]
    at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
 [?:?]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
    at java.lang.Thread.run(Thread.java:834) [?:?]
{noformat}

  was:
This behavior leads to getting stuck in any RAFT operation because the leader 
cannot be elected. This issue is reproduced in the test 
ItDataSchemaSyncTest#checkSchemasCorrectlyRestore, to test it in a log just add 
an assertion:

{code:title="Loza#startRaftGroupNodeInternal"}

assert configuration.peers().contains(nodeId.peer()) || configuration.learners()
                .contains(nodeId.peer()) : "Raft node started on a peer where 
it should not be";

{code}
{noformat}
[2023-10-10T20:51:51,154][ERROR][%node0%tableManager-io-11][WatchProcessor] 
Error occurred when processing a watch event
 java.lang.AssertionError: Raft node started on a peer where it should not be
    at 
org.apache.ignite.internal.raft.Loza.startRaftGroupNodeInternal(Loza.java:361) 
~[main/:?]
    at org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:252) 
~[main/:?]
    at org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:225) 
~[main/:?]
    at 
org.apache.ignite.internal.table.distributed.TableManager.startPartitionRaftGroupNode(TableManager.java:1986)
 ~[main/:?]
    at 
org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$90(TableManager.java:1878)
 ~[main/:?]
    at 
org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:805) 
~[main/:?]
    at 
org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$91(TableManager.java:1848)
 ~[main/:?]
    at 
java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783)
 [?:?]
    at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
 [?:?]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
    at java.lang.Thread.run(Thread.java:834) [?:?]
{noformat}


> Raft node started in a node where it should not be
> --------------------------------------------------
>
>                 Key: IGNITE-20640
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20640
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Vladislav Pyatkov
>            Priority: Major
>
> This behavior leads to getting stuck in any RAFT operation because the leader 
> cannot be elected. This issue is reproduced in the test 
> ItDataSchemaSyncTest#checkSchemasCorrectlyRestore, to test it in a log just 
> add an assertion:
> {code:title='Loza#startRaftGroupNodeInternal'}
> assert configuration.peers().contains(nodeId.peer()) || 
> configuration.learners()
>                 .contains(nodeId.peer()) : "Raft node started on a peer where 
> it should not be";
> {code}
> {noformat}
> [2023-10-10T20:51:51,154][ERROR][%node0%tableManager-io-11][WatchProcessor] 
> Error occurred when processing a watch event
>  java.lang.AssertionError: Raft node started on a peer where it should not be
>     at 
> org.apache.ignite.internal.raft.Loza.startRaftGroupNodeInternal(Loza.java:361)
>  ~[main/:?]
>     at org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:252) 
> ~[main/:?]
>     at org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:225) 
> ~[main/:?]
>     at 
> org.apache.ignite.internal.table.distributed.TableManager.startPartitionRaftGroupNode(TableManager.java:1986)
>  ~[main/:?]
>     at 
> org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$90(TableManager.java:1878)
>  ~[main/:?]
>     at 
> org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:805) 
> ~[main/:?]
>     at 
> org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$91(TableManager.java:1848)
>  ~[main/:?]
>     at 
> java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783)
>  [?:?]
>     at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
>  [?:?]
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>     at java.lang.Thread.run(Thread.java:834) [?:?]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to