[
https://issues.apache.org/jira/browse/IGNITE-20640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vladislav Pyatkov updated IGNITE-20640:
---------------------------------------
Description:
*Motivation*
Due to nodes starting simultaneously, tests may have several rebalances at the
start. After the rebalance is finished, the list of peers for the raft
replication group can be different. The changed list of peers should apply to
RAFT clients, but it does not happen.
The method (InternalTableImpl#updateInternalTableRaftGroupService) updates
clients only on table start and does not consider a further rebalance.
Currently, we try to send a raft command but receive a timeout exception
because the leader is absent from the list of peers:
{noformat}
[2023-10-13T15:16:18,120][INFO ][%node1%tableManager-io-13][Loza] Start new
raft node=RaftNodeId [groupId=3_part_12, peer=Peer [consistentId=node1, idx=0]]
with initial configuration=PeersAndLearners [peers=Set12 [Peer
[consistentId=node1, idx=0]], learners=SetN []]
[2023-10-13T15:16:18,472][INFO ][%node2%tableManager-io-14][Loza] Start new
raft node=RaftNodeId [groupId=3_part_12, peer=Peer [consistentId=node2, idx=0]]
with initial configuration=PeersAndLearners [peers=Set12 [Peer
[consistentId=node1, idx=0]], learners=SetN []]
...
[2023-10-13T15:16:18,661][ERROR][%node1%JRaft-Request-Processor-21][RpcRequestProcessor]
handleRequest ChangePeersAsyncRequestImpl [groupId=3_part_12, leaderId=node1,
newLearnersList=ArrayList [], newPeersList=ArrayList [node2], term=2] failed
java.lang.IllegalStateException: Not leader
at
org.apache.ignite.raft.jraft.core.NodeImpl.listPeers(NodeImpl.java:3293)
~[main/:?]
at
org.apache.ignite.raft.jraft.rpc.impl.cli.ChangePeersAsyncRequestProcessor.processRequest0(ChangePeersAsyncRequestProcessor.java:55)
~[main/:?]
at
org.apache.ignite.raft.jraft.rpc.impl.cli.ChangePeersAsyncRequestProcessor.processRequest0(ChangePeersAsyncRequestProcessor.java:36)
~[main/:?]
at
org.apache.ignite.raft.jraft.rpc.impl.cli.BaseCliRequestProcessor.processRequest(BaseCliRequestProcessor.java:112)
~[main/:?]
at
org.apache.ignite.raft.jraft.rpc.RpcRequestProcessor.handleRequest(RpcRequestProcessor.java:49)
~[main/:?]
at
org.apache.ignite.raft.jraft.rpc.RpcRequestProcessor.handleRequest(RpcRequestProcessor.java:29)
~[main/:?]
at
org.apache.ignite.raft.jraft.rpc.impl.IgniteRpcServer$RpcMessageHandler.lambda$onReceived$0(IgniteRpcServer.java:194)
~[main/:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
{noformat}
*Difinition of done*
After rebalancing the non-interaction peer list, the list should be updated and
RAFT commands applied.
was:
Due to nodes starting simultaneously, tests may have several rebalances at the
start. After the rebalance is finished, the list of peers for the raft
replication group can be different. The changed list of peers should apply to
RAFT clients, but it does not happen.
The method (InternalTableImpl#updateInternalTableRaftGroupService) updates
clients only on table start and does not consider a further rebalance.
Currently, we try to send a raft command but receive a timeout exception
because the leader is absent from the list of peers:
{noformat}
[2023-10-13T15:16:18,120][INFO ][%node1%tableManager-io-13][Loza] Start new
raft node=RaftNodeId [groupId=3_part_12, peer=Peer [consistentId=node1, idx=0]]
with initial configuration=PeersAndLearners [peers=Set12 [Peer
[consistentId=node1, idx=0]], learners=SetN []]
[2023-10-13T15:16:18,472][INFO ][%node2%tableManager-io-14][Loza] Start new
raft node=RaftNodeId [groupId=3_part_12, peer=Peer [consistentId=node2, idx=0]]
with initial configuration=PeersAndLearners [peers=Set12 [Peer
[consistentId=node1, idx=0]], learners=SetN []]
...
[2023-10-13T15:16:18,661][ERROR][%node1%JRaft-Request-Processor-21][RpcRequestProcessor]
handleRequest ChangePeersAsyncRequestImpl [groupId=3_part_12, leaderId=node1,
newLearnersList=ArrayList [], newPeersList=ArrayList [node2], term=2] failed
java.lang.IllegalStateException: Not leader
at
org.apache.ignite.raft.jraft.core.NodeImpl.listPeers(NodeImpl.java:3293)
~[main/:?]
at
org.apache.ignite.raft.jraft.rpc.impl.cli.ChangePeersAsyncRequestProcessor.processRequest0(ChangePeersAsyncRequestProcessor.java:55)
~[main/:?]
at
org.apache.ignite.raft.jraft.rpc.impl.cli.ChangePeersAsyncRequestProcessor.processRequest0(ChangePeersAsyncRequestProcessor.java:36)
~[main/:?]
at
org.apache.ignite.raft.jraft.rpc.impl.cli.BaseCliRequestProcessor.processRequest(BaseCliRequestProcessor.java:112)
~[main/:?]
at
org.apache.ignite.raft.jraft.rpc.RpcRequestProcessor.handleRequest(RpcRequestProcessor.java:49)
~[main/:?]
at
org.apache.ignite.raft.jraft.rpc.RpcRequestProcessor.handleRequest(RpcRequestProcessor.java:29)
~[main/:?]
at
org.apache.ignite.raft.jraft.rpc.impl.IgniteRpcServer$RpcMessageHandler.lambda$onReceived$0(IgniteRpcServer.java:194)
~[main/:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
{noformat}
> RAFT client does not change peers after rebalance
> -------------------------------------------------
>
> Key: IGNITE-20640
> URL: https://issues.apache.org/jira/browse/IGNITE-20640
> Project: Ignite
> Issue Type: Bug
> Reporter: Vladislav Pyatkov
> Priority: Major
>
> *Motivation*
> Due to nodes starting simultaneously, tests may have several rebalances at
> the start. After the rebalance is finished, the list of peers for the raft
> replication group can be different. The changed list of peers should apply to
> RAFT clients, but it does not happen.
> The method (InternalTableImpl#updateInternalTableRaftGroupService) updates
> clients only on table start and does not consider a further rebalance.
> Currently, we try to send a raft command but receive a timeout exception
> because the leader is absent from the list of peers:
> {noformat}
> [2023-10-13T15:16:18,120][INFO ][%node1%tableManager-io-13][Loza] Start new
> raft node=RaftNodeId [groupId=3_part_12, peer=Peer [consistentId=node1,
> idx=0]] with initial configuration=PeersAndLearners [peers=Set12 [Peer
> [consistentId=node1, idx=0]], learners=SetN []]
> [2023-10-13T15:16:18,472][INFO ][%node2%tableManager-io-14][Loza] Start new
> raft node=RaftNodeId [groupId=3_part_12, peer=Peer [consistentId=node2,
> idx=0]] with initial configuration=PeersAndLearners [peers=Set12 [Peer
> [consistentId=node1, idx=0]], learners=SetN []]
> ...
> [2023-10-13T15:16:18,661][ERROR][%node1%JRaft-Request-Processor-21][RpcRequestProcessor]
> handleRequest ChangePeersAsyncRequestImpl [groupId=3_part_12,
> leaderId=node1, newLearnersList=ArrayList [], newPeersList=ArrayList [node2],
> term=2] failed
> java.lang.IllegalStateException: Not leader
> at
> org.apache.ignite.raft.jraft.core.NodeImpl.listPeers(NodeImpl.java:3293)
> ~[main/:?]
> at
> org.apache.ignite.raft.jraft.rpc.impl.cli.ChangePeersAsyncRequestProcessor.processRequest0(ChangePeersAsyncRequestProcessor.java:55)
> ~[main/:?]
> at
> org.apache.ignite.raft.jraft.rpc.impl.cli.ChangePeersAsyncRequestProcessor.processRequest0(ChangePeersAsyncRequestProcessor.java:36)
> ~[main/:?]
> at
> org.apache.ignite.raft.jraft.rpc.impl.cli.BaseCliRequestProcessor.processRequest(BaseCliRequestProcessor.java:112)
> ~[main/:?]
> at
> org.apache.ignite.raft.jraft.rpc.RpcRequestProcessor.handleRequest(RpcRequestProcessor.java:49)
> ~[main/:?]
> at
> org.apache.ignite.raft.jraft.rpc.RpcRequestProcessor.handleRequest(RpcRequestProcessor.java:29)
> ~[main/:?]
> at
> org.apache.ignite.raft.jraft.rpc.impl.IgniteRpcServer$RpcMessageHandler.lambda$onReceived$0(IgniteRpcServer.java:194)
> ~[main/:?]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> [?:?]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> [?:?]
> at java.lang.Thread.run(Thread.java:834) [?:?]
> {noformat}
> *Difinition of done*
> After rebalancing the non-interaction peer list, the list should be updated
> and RAFT commands applied.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)