[ https://issues.apache.org/jira/browse/IGNITE-17064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mirza Aliev updated IGNITE-17064: --------------------------------- Description: ItRaftGroupServiceTest#testTransferLeadership is flaky with {noformat} java.util.concurrent.ExecutionException: class org.apache.ignite.raft.jraft.rpc.impl.RaftException: EBUSY:Changing the configuration {noformat} There is a comment in {{NodeImpl#transferLeadershipTo}} about the situation when EBUSY:Changing the configuration is thrown: {noformat} if (this.confCtx.isBusy()) { // It's very messy to deal with the case when the |peer| received // TimeoutNowRequest and increase the term while somehow another leader // which was not replicated with the newest configuration has been // elected. If no add_peer with this very |peer| is to be invoked ever // after nor this peer is to be killed, this peer will spin in the voting // procedure and make the each new leader stepped down when the peer // reached vote timeout and it starts to vote (because it will increase // the term of the group) // To make things simple, refuse the operation and force users to // invoke transfer_leadership_to after configuration changing is // completed so that the peer's configuration is up-to-date when it // receives the TimeOutNowRequest. LOG.warn( "Node {} refused to transfer leadership to peer {} when the leader is changing the configuration.", getNodeId(), peer); return new Status(RaftError.EBUSY, "Changing the configuration"); } {noformat} The current limitation must be investigated. Seems like the easiest way to fix test is to rewrite it and repeat transfer leadership invocation. was: ItRaftGroupServiceTest#testTransferLeadership is flaky with {{java.util.concurrent.ExecutionException: class org.apache.ignite.raft.jraft.rpc.impl.RaftException: EBUSY:Changing the configuration}} There is a comment in {{NodeImpl#transferLeadershipTo}} about the situation when EBUSY:Changing the configuration is thrown: {noformat} if (this.confCtx.isBusy()) { // It's very messy to deal with the case when the |peer| received // TimeoutNowRequest and increase the term while somehow another leader // which was not replicated with the newest configuration has been // elected. If no add_peer with this very |peer| is to be invoked ever // after nor this peer is to be killed, this peer will spin in the voting // procedure and make the each new leader stepped down when the peer // reached vote timeout and it starts to vote (because it will increase // the term of the group) // To make things simple, refuse the operation and force users to // invoke transfer_leadership_to after configuration changing is // completed so that the peer's configuration is up-to-date when it // receives the TimeOutNowRequest. LOG.warn( "Node {} refused to transfer leadership to peer {} when the leader is changing the configuration.", getNodeId(), peer); return new Status(RaftError.EBUSY, "Changing the configuration"); } {noformat} The current limitation must be investigated. Seems like the easiest way to fix test is to rewrite it and repeat transfer leadership invocation. > ItRaftGroupServiceTest#testTransferLeadership is flaky > ------------------------------------------------------ > > Key: IGNITE-17064 > URL: https://issues.apache.org/jira/browse/IGNITE-17064 > Project: Ignite > Issue Type: Bug > Reporter: Mirza Aliev > Priority: Blocker > Labels: ignite-3 > > ItRaftGroupServiceTest#testTransferLeadership is flaky with > {noformat} > java.util.concurrent.ExecutionException: class > org.apache.ignite.raft.jraft.rpc.impl.RaftException: EBUSY:Changing the > configuration > {noformat} > There is a comment in {{NodeImpl#transferLeadershipTo}} about the situation > when EBUSY:Changing the configuration is thrown: > > {noformat} > if (this.confCtx.isBusy()) { > // It's very messy to deal with the case when the |peer| > received > // TimeoutNowRequest and increase the term while somehow > another leader > // which was not replicated with the newest configuration has > been > // elected. If no add_peer with this very |peer| is to be > invoked ever > // after nor this peer is to be killed, this peer will spin > in the voting > // procedure and make the each new leader stepped down when > the peer > // reached vote timeout and it starts to vote (because it > will increase > // the term of the group) > // To make things simple, refuse the operation and force > users to > // invoke transfer_leadership_to after configuration changing > is > // completed so that the peer's configuration is up-to-date > when it > // receives the TimeOutNowRequest. > LOG.warn( > "Node {} refused to transfer leadership to peer {} when > the leader is changing the configuration.", > getNodeId(), peer); > return new Status(RaftError.EBUSY, "Changing the > configuration"); > } > {noformat} > The current limitation must be investigated. > Seems like the easiest way to fix test is to rewrite it and repeat transfer > leadership invocation. > -- This message was sent by Atlassian Jira (v8.20.7#820007)