[jira] [Updated] (IGNITE-20640) Raft node started in a node where it should not be

Vladislav Pyatkov (Jira) Thu, 12 Oct 2023 13:47:04 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-20640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vladislav Pyatkov updated IGNITE-20640:
---------------------------------------
    Description: 
This behavior leads to getting stuck in any RAFT operation because the leader 
cannot be elected.
{noformat}
[2023-10-10T16:48:48,771][INFO ][%node1%tableManager-io-3][Loza] Start new raft 
node=RaftNodeId [groupId=3_part_15, peer=Peer [consistentId=node1, idx=0]] with 
initial configuration=PeersAndLearners [peers=Set12 [Peer [consistentId=node2, 
idx=0]], learners=SetN []]
{noformat}
This issue is reproduced in the test 
ItDataSchemaSyncTest#checkSchemasCorrectlyRestore, to test it in a log just add 
an assertion:

{code:title=Loza#startRaftGroupNodeInternal}
assert configuration.peers().contains(nodeId.peer()) || configuration.learners()
                .contains(nodeId.peer()) : "Raft node started on a peer where 
it should not be";
{code}
{noformat}
[2023-10-10T20:51:51,154][ERROR][%node0%tableManager-io-11][WatchProcessor] 
Error occurred when processing a watch event
 java.lang.AssertionError: Raft node started on a peer where it should not be
    at 
org.apache.ignite.internal.raft.Loza.startRaftGroupNodeInternal(Loza.java:361) 
~[main/:?]
    at org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:252) 
~[main/:?]
    at org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:225) 
~[main/:?]
    at 
org.apache.ignite.internal.table.distributed.TableManager.startPartitionRaftGroupNode(TableManager.java:1986)
 ~[main/:?]
    at 
org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$90(TableManager.java:1878)
 ~[main/:?]
    at 
org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:805) 
~[main/:?]
    at 
org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$91(TableManager.java:1848)
 ~[main/:?]
    at 
java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783)
 [?:?]
    at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
 [?:?]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
    at java.lang.Thread.run(Thread.java:834) [?:?]
{noformat}

  was:
This behavior leads to getting stuck in any RAFT operation because the leader 
cannot be elected. This issue is reproduced in the test 
ItDataSchemaSyncTest#checkSchemasCorrectlyRestore, to test it in a log just add 
an assertion:

{code:title=Loza#startRaftGroupNodeInternal}
assert configuration.peers().contains(nodeId.peer()) || configuration.learners()
                .contains(nodeId.peer()) : "Raft node started on a peer where 
it should not be";
{code}
{noformat}
[2023-10-10T20:51:51,154][ERROR][%node0%tableManager-io-11][WatchProcessor] 
Error occurred when processing a watch event
 java.lang.AssertionError: Raft node started on a peer where it should not be
    at 
org.apache.ignite.internal.raft.Loza.startRaftGroupNodeInternal(Loza.java:361) 
~[main/:?]
    at org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:252) 
~[main/:?]
    at org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:225) 
~[main/:?]
    at 
org.apache.ignite.internal.table.distributed.TableManager.startPartitionRaftGroupNode(TableManager.java:1986)
 ~[main/:?]
    at 
org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$90(TableManager.java:1878)
 ~[main/:?]
    at 
org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:805) 
~[main/:?]
    at 
org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$91(TableManager.java:1848)
 ~[main/:?]
    at 
java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783)
 [?:?]
    at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
 [?:?]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
    at java.lang.Thread.run(Thread.java:834) [?:?]
{noformat}


> Raft node started in a node where it should not be
> --------------------------------------------------
>
>                 Key: IGNITE-20640
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20640
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Vladislav Pyatkov
>            Priority: Major
>
> This behavior leads to getting stuck in any RAFT operation because the leader 
> cannot be elected.
> {noformat}
> [2023-10-10T16:48:48,771][INFO ][%node1%tableManager-io-3][Loza] Start new 
> raft node=RaftNodeId [groupId=3_part_15, peer=Peer [consistentId=node1, 
> idx=0]] with initial configuration=PeersAndLearners [peers=Set12 [Peer 
> [consistentId=node2, idx=0]], learners=SetN []]
> {noformat}
> This issue is reproduced in the test 
> ItDataSchemaSyncTest#checkSchemasCorrectlyRestore, to test it in a log just 
> add an assertion:
> {code:title=Loza#startRaftGroupNodeInternal}
> assert configuration.peers().contains(nodeId.peer()) || 
> configuration.learners()
>                 .contains(nodeId.peer()) : "Raft node started on a peer where 
> it should not be";
> {code}
> {noformat}
> [2023-10-10T20:51:51,154][ERROR][%node0%tableManager-io-11][WatchProcessor] 
> Error occurred when processing a watch event
>  java.lang.AssertionError: Raft node started on a peer where it should not be
>     at 
> org.apache.ignite.internal.raft.Loza.startRaftGroupNodeInternal(Loza.java:361)
>  ~[main/:?]
>     at org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:252) 
> ~[main/:?]
>     at org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:225) 
> ~[main/:?]
>     at 
> org.apache.ignite.internal.table.distributed.TableManager.startPartitionRaftGroupNode(TableManager.java:1986)
>  ~[main/:?]
>     at 
> org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$90(TableManager.java:1878)
>  ~[main/:?]
>     at 
> org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:805) 
> ~[main/:?]
>     at 
> org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$91(TableManager.java:1848)
>  ~[main/:?]
>     at 
> java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:783)
>  [?:?]
>     at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
>  [?:?]
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>     at java.lang.Thread.run(Thread.java:834) [?:?]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-20640) Raft node started in a node where it should not be

Reply via email to