Hi Snehasish, > Good news everyone! This issue is no longer reproducible by using *3.3.0-SNAPSHOT
This is really good news. Thanks a lot for testing Ratis and digging out the problem! Tsz-Wo On Wed, Jan 21, 2026 at 12:08 AM Snehasish Roy <[email protected]> wrote: > Hello, > > Good news everyone! This issue is no longer reproducible by using > *3.3.0-SNAPSHOT > *version. > Based on the merged PR's, I think > https://github.com/apache/ratis/pull/1331 > solved the issue of listener role transition. > > Thank you for looking into this. > > > Regards, > Snehasish > > > On Sat, 17 Jan 2026 at 01:29, Tsz Wo Sze <[email protected]> wrote: > > > > ... : n4@group-ABB3109A44C2 replies to PRE_VOTE vote request: > > n2<-n4#0:FAIL-t1-last:(t:1, i:16). Peer's state: n4@group-ABB3109A44C2 > :t1, > > leader=n1, voted=null, > > raftlog=Memoized:n4@group-ABB3109A44C2-SegmentedRaftLog > > :OPENED:c16:last(t:1, > > i:16), conf=conf: {index: 15, cur=peers:[n1|0.0.0.0:9000, n2| > 0.0.0.0:9001, > > n4|0.0.0.0:9003]|listeners:[], old=null} > > > ... : n4@group-ABB3109A44C2: receive requestVote(PRE_VOTE, n2, > > group-ABB3109A44C2, 1, (t:1, i:16)) > > > ... : n4@group-ABB3109A44C2-LISTENER: reject PRE_VOTE from n2: this > > server is a listener, who is a non-voting member > > > > According to the above log, you actually removed n3. Somehow n4 rejected > > the voteRequest from n2 and said that it was a non-voting member. > > > > It seems like a bug. Could you share all the commands you have run? I > > could try reproducing it. > > > > Thanks. > > Tsz-Wo > > > > > > > > On Fri, Jan 16, 2026 at 11:47 AM Tsz Wo Sze <[email protected]> wrote: > > > > > Hi Snehasish, > > > > > > Thanks for providing the details! > > > > > > > 3. After killing n3 and promoting n4 as follower > > > > ... > > > > 4. After killing n1 (leader) instance > > > > > > As Xinyu mentioned, you probably have changed the voting member size > from > > > 3 to 4. So, killing two servers, n3 and n1, makes it impossible to > > elect a > > > new leader since a majority requires 3 voting members. > > > > > > I guess you were setConf from [follower: n1,n2,n3; listener: n4] > > > - to [follower: n1,n2,n3,n4] > > > > > > In order to keep voting member size 3, I suggest you to setConf > > > - to [follower: n1,n2,n4; Listener n3] > > > (or removing n3, i.e. setConf to [follower: n1,n2, n4]) > > > > > > Hope it helps. > > > Tsz-Wo > > > > > > > > > On Thu, Jan 15, 2026 at 9:44 PM Xinyu Tan <[email protected]> wrote: > > > > > >> Hi, > > >> > > >> Okay, I understand your question! > > >> > > >> Congratulations, you may have found a potential bug. > > >> > > >> Next, you can further investigate Ratis's leader election mechanism > and > > >> see why the other two followers refused to vote! > > >> > > >> You maybe see some code according to the logs such as[1][2]. > > >> > > >> [1] > > >> > > > https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1428 > > >> [2] > > >> > > > https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderElection.java#L442-L481 > > >> > > >> Best > > >> -------------- > > >> Xinyu Tan > > >> > > >> On 2026/01/16 05:16:04 Snehasish Roy wrote: > > >> > Hello, > > >> > > > >> > Thank you for your prompt response. > > >> > > > >> > > You only killed n3 instead of removing it from the cluster, and n1 > > >> and n2 > > >> > formed the quorum. > > >> > I did remove n3 from the cluster before promoting n4 to the > follower. > > >> This > > >> > was successful because n1 and n2 were still online. > > >> > > > >> > This is evident from the below info which shows n1, n2 and n4 are in > > the > > >> > cluster. > > >> > > > >> > ``` > > >> > ❯ ./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001, > 0.0.0.0:9002 > > , > > >> > 0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2 > > >> > [main] WARN org.apache.ratis.metrics.MetricRegistriesLoader - Found > > >> > multiple MetricRegistries: [class > > >> > org.apache.ratis.metrics.impl.MetricRegistriesImpl, class > > >> > org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl]. Using > > the > > >> > first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl > > >> > group id: 02511d47-d67c-49a3-9011-abb3109a44c2 > > >> > leader info: n1(0.0.0.0:9000) > > >> > > > >> > [server { > > >> > id: "n1" > > >> > address: "0.0.0.0:9000" > > >> > startupRole: FOLLOWER > > >> > } > > >> > commitIndex: 16 > > >> > , server { > > >> > id: "n2" > > >> > address: "0.0.0.0:9001" > > >> > startupRole: FOLLOWER > > >> > } > > >> > commitIndex: 16 > > >> > , server { > > >> > id: "n4" > > >> > address: "0.0.0.0:9003" > > >> > startupRole: FOLLOWER > > >> > } > > >> > commitIndex: 16 > > >> > ] > > >> > applied { > > >> > term: 1 > > >> > index: 16 > > >> > } > > >> > committed { > > >> > term: 1 > > >> > index: 16 > > >> > } > > >> > lastEntry { > > >> > term: 1 > > >> > index: 16 > > >> > } > > >> > ``` > > >> > > > >> > After killing n1, the logs from n2 also list the configuration which > > >> > clearly shows the peers as n1, n2 and n4. > > >> > Logs also demonstrates that n2 is asking for votes from n1 and n4 - > > not > > >> > from n3, indicating the cluster is only a 3 node cluster. > > >> > > > >> > ``` > > >> > INFO [2026-01-15 17:48:03,347] > > [n2@group-ABB3109A44C2-LeaderElection176 > > >> ] > > >> > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection176 PRE_VOTE > > >> round 0: > > >> > submit vote requests at term 1 for conf: {index: 15, cur=peers:[n1| > > >> > 0.0.0.0:9000, n2|0.0.0.0:9001, n4|0.0.0.0:9003]|listeners:[], > > old=null} > > >> > ``` > > >> > > > >> > Please let me know if I misunderstood something. > > >> > > > >> > > > >> > > > >> > Regards, > > >> > Snehasish > > >> > > > >> > On Fri, 16 Jan 2026 at 08:49, Xinyu Tan <[email protected]> > wrote: > > >> > > > >> > > Hi, > > >> > > > > >> > > In your scenario, there are two phenomena: > > >> > > > > >> > > Phenomenon 1 > > >> > > You only killed n3 instead of removing it from the cluster, and n1 > > >> and n2 > > >> > > formed the quorum. As a result, compared to the last test, you > were > > >> able to > > >> > > successfully promote n4 from listener to follower, which is as > > >> expected > > >> > > because, during the member change, the original quorum of n1, n2, > > and > > >> n3, > > >> > > i.e., n1 and n2, were still online. > > >> > > > > >> > > Phenomenon 2 > > >> > > It is important to note that once you promote n4 to a follower, > the > > >> group > > >> > > members become n1, n2, n3, and n4. The quorum is now 3 instead of > 2, > > >> and > > >> > > since n3 has already been killed, killing n1 at this point would > > >> cause the > > >> > > consensus group to fail to form a quorum of 3 members, making it > > >> impossible > > >> > > to elect a new leader. If you wish to perform this action, you can > > try > > >> > > removing the killed n3 from the group first. This way, the > consensus > > >> group > > >> > > will only consist of n1, n2, and n4, and the quorum will be 2. At > > this > > >> > > point, killing n1 should allow the election of a new leader, as > the > > >> quorum > > >> > > of 2 members is still online. > > >> > > > > >> > > Your test scenario involves the most complex part of the consensus > > >> > > algorithm—member changes. I think you should take a closer look at > > >> the PhD > > >> > > thesis of the Raft authors[1], which is more detailed than the > > ATC2014 > > >> > > conference version. I believe that after reading it, you will > have a > > >> deeper > > >> > > understanding of the Raft algorithm! > > >> > > > > >> > > Looking forward to your next test. > > >> > > > > >> > > [1] > > https://github.com/ongardie/dissertation/blob/master/stanford.pdf > > >> > > > > >> > > Best > > >> > > ----------------- > > >> > > Xinyu Tan > > >> > > > > >> > > On 2026/01/15 15:20:24 Snehasish Roy wrote: > > >> > > > Hello, > > >> > > > > > >> > > > Based on your inputs, I was able to reproduce the issue > > >> consistently. > > >> > > > > > >> > > > 1. After starting n1, n2 and n3 nodes > > >> > > > > > >> > > > ``` > > >> > > > ./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001, > > 0.0.0.0:9002 > > >> , > > >> > > > 0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2 > > >> > > > [main] WARN org.apache.ratis.metrics.MetricRegistriesLoader - > > Found > > >> > > > multiple MetricRegistries: [class > > >> > > > org.apache.ratis.metrics.impl.MetricRegistriesImpl, class > > >> > > > org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl]. > > >> Using the > > >> > > > first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl > > >> > > > group id: 02511d47-d67c-49a3-9011-abb3109a44c2 > > >> > > > leader info: n1(0.0.0.0:9000) > > >> > > > > > >> > > > [server { > > >> > > > id: "n1" > > >> > > > address: "0.0.0.0:9000" > > >> > > > startupRole: FOLLOWER > > >> > > > } > > >> > > > commitIndex: 8 > > >> > > > , server { > > >> > > > id: "n2" > > >> > > > address: "0.0.0.0:9001" > > >> > > > startupRole: FOLLOWER > > >> > > > } > > >> > > > commitIndex: 8 > > >> > > > , server { > > >> > > > id: "n3" > > >> > > > address: "0.0.0.0:9002" > > >> > > > startupRole: FOLLOWER > > >> > > > } > > >> > > > commitIndex: 8 > > >> > > > ] > > >> > > > applied { > > >> > > > term: 1 > > >> > > > index: 8 > > >> > > > } > > >> > > > committed { > > >> > > > term: 1 > > >> > > > index: 8 > > >> > > > } > > >> > > > lastEntry { > > >> > > > term: 1 > > >> > > > index: 8 > > >> > > > } > > >> > > > ``` > > >> > > > > > >> > > > 2. After adding n4 as listener > > >> > > > > > >> > > > ``` > > >> > > > ./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001, > > 0.0.0.0:9002 > > >> , > > >> > > > 0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2 > > >> > > > [main] WARN org.apache.ratis.metrics.MetricRegistriesLoader - > > Found > > >> > > > multiple MetricRegistries: [class > > >> > > > org.apache.ratis.metrics.impl.MetricRegistriesImpl, class > > >> > > > org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl]. > > >> Using the > > >> > > > first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl > > >> > > > group id: 02511d47-d67c-49a3-9011-abb3109a44c2 > > >> > > > leader info: n1(0.0.0.0:9000) > > >> > > > > > >> > > > [server { > > >> > > > id: "n1" > > >> > > > address: "0.0.0.0:9000" > > >> > > > startupRole: FOLLOWER > > >> > > > } > > >> > > > commitIndex: 12 > > >> > > > , server { > > >> > > > id: "n2" > > >> > > > address: "0.0.0.0:9001" > > >> > > > startupRole: FOLLOWER > > >> > > > } > > >> > > > commitIndex: 12 > > >> > > > , server { > > >> > > > id: "n3" > > >> > > > address: "0.0.0.0:9002" > > >> > > > startupRole: FOLLOWER > > >> > > > } > > >> > > > commitIndex: 12 > > >> > > > , server { > > >> > > > id: "n4" > > >> > > > address: "0.0.0.0:9003" > > >> > > > startupRole: LISTENER > > >> > > > } > > >> > > > commitIndex: 12 > > >> > > > ] > > >> > > > applied { > > >> > > > term: 1 > > >> > > > index: 12 > > >> > > > } > > >> > > > committed { > > >> > > > term: 1 > > >> > > > index: 12 > > >> > > > } > > >> > > > lastEntry { > > >> > > > term: 1 > > >> > > > index: 12 > > >> > > > } > > >> > > > ``` > > >> > > > > > >> > > > 3. After killing n3 and promoting n4 as follower > > >> > > > > > >> > > > ``` > > >> > > > ❯ ./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001, > > >> 0.0.0.0:9002, > > >> > > > 0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2 > > >> > > > [main] WARN org.apache.ratis.metrics.MetricRegistriesLoader - > > Found > > >> > > > multiple MetricRegistries: [class > > >> > > > org.apache.ratis.metrics.impl.MetricRegistriesImpl, class > > >> > > > org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl]. > > >> Using the > > >> > > > first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl > > >> > > > group id: 02511d47-d67c-49a3-9011-abb3109a44c2 > > >> > > > leader info: n1(0.0.0.0:9000) > > >> > > > > > >> > > > [server { > > >> > > > id: "n1" > > >> > > > address: "0.0.0.0:9000" > > >> > > > startupRole: FOLLOWER > > >> > > > } > > >> > > > commitIndex: 16 > > >> > > > , server { > > >> > > > id: "n2" > > >> > > > address: "0.0.0.0:9001" > > >> > > > startupRole: FOLLOWER > > >> > > > } > > >> > > > commitIndex: 16 > > >> > > > , server { > > >> > > > id: "n4" > > >> > > > address: "0.0.0.0:9003" > > >> > > > startupRole: FOLLOWER > > >> > > > } > > >> > > > commitIndex: 16 > > >> > > > ] > > >> > > > applied { > > >> > > > term: 1 > > >> > > > index: 16 > > >> > > > } > > >> > > > committed { > > >> > > > term: 1 > > >> > > > index: 16 > > >> > > > } > > >> > > > lastEntry { > > >> > > > term: 1 > > >> > > > index: 16 > > >> > > > } > > >> > > > ``` > > >> > > > > > >> > > > 4. After killing n1 (leader) instance > > >> > > > > > >> > > > ``` > > >> > > > ❯ ./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001, > > >> 0.0.0.0:9002, > > >> > > > 0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2 > > >> > > > [main] WARN org.apache.ratis.metrics.MetricRegistriesLoader - > > Found > > >> > > > multiple MetricRegistries: [class > > >> > > > org.apache.ratis.metrics.impl.MetricRegistriesImpl, class > > >> > > > org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl]. > > >> Using the > > >> > > > first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl > > >> > > > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: > > >> UNAVAILABLE: > > >> > > io > > >> > > > exception > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:368) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:349) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:174) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.proto.grpc.AdminProtocolServiceGrpc$AdminProtocolServiceBlockingStub.groupList(AdminProtocolServiceGrpc.java:573) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupList(GrpcClientProtocolClient.java:167) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:106) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.client.impl.BlockingImpl.sendRequest(BlockingImpl.java:147) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.client.impl.BlockingImpl.sendRequestWithRetry(BlockingImpl.java:109) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.client.impl.GroupManagementImpl.list(GroupManagementImpl.java:69) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.shell.cli.CliUtils.lambda$getGroupId$1(CliUtils.java:118) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.shell.cli.CliUtils.applyFunctionReturnFirstNonNull(CliUtils.java:72) > > >> > > > at > > org.apache.ratis.shell.cli.CliUtils.getGroupId(CliUtils.java:117) > > >> > > > at > > >> > > > org.apache.ratis.shell.cli.sh > > >> > > .command.AbstractRatisCommand.run(AbstractRatisCommand.java:70) > > >> > > > at > > >> > > > org.apache.ratis.shell.cli.sh > > >> > > .group.GroupInfoCommand.run(GroupInfoCommand.java:47) > > >> > > > at > > >> org.apache.ratis.shell.cli.AbstractShell.run(AbstractShell.java:104) > > >> > > > at org.apache.ratis.shell.cli.sh > > >> .RatisShell.main(RatisShell.java:62) > > >> > > > Caused by: > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: > > >> > > > Connection refused: /0.0.0.0:9000 > > >> > > > Caused by: java.net.ConnectException: Connection refused > > >> > > > at java.base/sun.nio.ch.Net.pollConnect(Native Method) > > >> > > > at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672) > > >> > > > at > > >> > > > java.base/sun.nio.ch > > >> > > .SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:336) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:339) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:784) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:732) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:658) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > > >> > > > at java.base/java.lang.Thread.run(Thread.java:833) > > >> > > > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: > > >> UNAVAILABLE: > > >> > > io > > >> > > > exception > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:368) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:349) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:174) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.proto.grpc.AdminProtocolServiceGrpc$AdminProtocolServiceBlockingStub.groupInfo(AdminProtocolServiceGrpc.java:580) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupInfo(GrpcClientProtocolClient.java:173) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:110) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.client.impl.BlockingImpl.sendRequest(BlockingImpl.java:147) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.client.impl.BlockingImpl.sendRequestWithRetry(BlockingImpl.java:109) > > >> > > > at org.apache.ratis.client.impl.GroupManagementImpl.info > > >> > > > (GroupManagementImpl.java:79) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.shell.cli.CliUtils.lambda$getGroupInfo$2(CliUtils.java:146) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.shell.cli.CliUtils.applyFunctionReturnFirstNonNull(CliUtils.java:72) > > >> > > > at > > >> org.apache.ratis.shell.cli.CliUtils.getGroupInfo(CliUtils.java:145) > > >> > > > at > > >> > > > org.apache.ratis.shell.cli.sh > > >> > > .command.AbstractRatisCommand.run(AbstractRatisCommand.java:71) > > >> > > > at > > >> > > > org.apache.ratis.shell.cli.sh > > >> > > .group.GroupInfoCommand.run(GroupInfoCommand.java:47) > > >> > > > at > > >> org.apache.ratis.shell.cli.AbstractShell.run(AbstractShell.java:104) > > >> > > > at org.apache.ratis.shell.cli.sh > > >> .RatisShell.main(RatisShell.java:62) > > >> > > > Caused by: > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: > > >> > > > Connection refused: /0.0.0.0:9000 > > >> > > > Caused by: java.net.ConnectException: Connection refused > > >> > > > at java.base/sun.nio.ch.Net.pollConnect(Native Method) > > >> > > > at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672) > > >> > > > at > > >> > > > java.base/sun.nio.ch > > >> > > .SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:336) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:339) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:784) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:732) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:658) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > > >> > > > at > > >> > > > > > >> > > > > >> > > > org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > > >> > > > at java.base/java.lang.Thread.run(Thread.java:833) > > >> > > > group id: 02511d47-d67c-49a3-9011-abb3109a44c2 > > >> > > > leader info: () > > >> > > > > > >> > > > [server { > > >> > > > id: "n2" > > >> > > > address: "0.0.0.0:9001" > > >> > > > startupRole: FOLLOWER > > >> > > > } > > >> > > > commitIndex: 16 > > >> > > > , server { > > >> > > > id: "n1" > > >> > > > address: "0.0.0.0:9000" > > >> > > > startupRole: FOLLOWER > > >> > > > } > > >> > > > commitIndex: 16 > > >> > > > , server { > > >> > > > id: "n4" > > >> > > > address: "0.0.0.0:9003" > > >> > > > startupRole: FOLLOWER > > >> > > > } > > >> > > > commitIndex: 16 > > >> > > > ] > > >> > > > applied { > > >> > > > term: 1 > > >> > > > index: 16 > > >> > > > } > > >> > > > committed { > > >> > > > term: 1 > > >> > > > index: 16 > > >> > > > } > > >> > > > lastEntry { > > >> > > > term: 1 > > >> > > > index: 16 > > >> > > > } > > >> > > > ``` > > >> > > > > > >> > > > Logs from n4 > > >> > > > ``` > > >> > > > INFO [2026-01-15 17:48:06,696] [grpc-default-executor-2] > > >> > > > [RaftServer$Division]: n4@group-ABB3109A44C2 replies to > PRE_VOTE > > >> vote > > >> > > > request: n2<-n4#0:FAIL-t1-last:(t:1, i:16). Peer's state: > > >> > > > n4@group-ABB3109A44C2:t1, leader=n1, voted=null, > > >> > > > raftlog=Memoized:n4@group-ABB3109A44C2-SegmentedRaftLog > > >> > > :OPENED:c16:last(t:1, > > >> > > > i:16), conf=conf: {index: 15, cur=peers:[n1|0.0.0.0:9000, n2| > > >> > > 0.0.0.0:9001, > > >> > > > n4|0.0.0.0:9003]|listeners:[], old=null} > > >> > > > INFO [2026-01-15 17:48:06,897] [grpc-default-executor-2] > > >> > > > [RaftServer$Division]: n4@group-ABB3109A44C2: receive > > >> > > requestVote(PRE_VOTE, > > >> > > > n2, group-ABB3109A44C2, 1, (t:1, i:16)) > > >> > > > INFO [2026-01-15 17:48:06,897] [grpc-default-executor-2] > > >> [VoteContext]: > > >> > > > n4@group-ABB3109A44C2-LISTENER: reject PRE_VOTE from n2: this > > >> server is > > >> > > a > > >> > > > listener, who is a non-voting member > > >> > > > ``` > > >> > > > > > >> > > > > > >> > > > Logs from n2 > > >> > > > > > >> > > > ``` > > >> > > > INFO [2026-01-15 17:48:03,347] > > >> [n2@group-ABB3109A44C2-LeaderElection176 > > >> > > ] > > >> > > > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection176 > > PRE_VOTE > > >> > > round 0: > > >> > > > submit vote requests at term 1 for conf: {index: 15, > > cur=peers:[n1| > > >> > > > 0.0.0.0:9000, n2|0.0.0.0:9001, n4|0.0.0.0:9003]|listeners:[], > > >> old=null} > > >> > > > INFO [2026-01-15 17:48:03,348] > > >> [n2@group-ABB3109A44C2-LeaderElection176 > > >> > > ] > > >> > > > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection176 got > > >> exception > > >> > > > when requesting votes: java.util.concurrent.ExecutionException: > > >> > > > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: > > >> UNAVAILABLE: > > >> > > io > > >> > > > exception > > >> > > > INFO [2026-01-15 17:48:03,352] > > >> [n2@group-ABB3109A44C2-LeaderElection176 > > >> > > ] > > >> > > > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection176: > > PRE_VOTE > > >> > > > REJECTED received 1 response(s) and 1 exception(s): > > >> > > > INFO [2026-01-15 17:48:03,352] > > >> [n2@group-ABB3109A44C2-LeaderElection176 > > >> > > ] > > >> > > > [LeaderElection]: Response 0: n2<-n4#0:FAIL-t1-last:(t:1, > i:16) > > >> > > > INFO [2026-01-15 17:48:03,352] > > >> [n2@group-ABB3109A44C2-LeaderElection176 > > >> > > ] > > >> > > > [LeaderElection]: Exception 1: > > >> java.util.concurrent.ExecutionException: > > >> > > > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: > > >> UNAVAILABLE: > > >> > > io > > >> > > > exception > > >> > > > ``` > > >> > > > > > >> > > > This indicates that the cluster is in an unstable state. > > >> > > > I am willing to contribute, could you guide me a bit on this? > > >> > > > > > >> > > > > > >> > > > Regards, > > >> > > > Snehasish > > >> > > > > > >> > > > On Wed, 14 Jan 2026 at 08:44, Snehasish Roy < > > >> [email protected]> > > >> > > > wrote: > > >> > > > > > >> > > > > Hello, > > >> > > > > > > >> > > > > > > >> > > > > Thank you for your inputs. I will check and update this > thread. > > >> > > > > > > >> > > > > > > >> > > > > Regards, > > >> > > > > Snehasish > > >> > > > > > > >> > > > > On Wed, 7 Jan, 2026, 8:52 am Xinyu Tan, <[email protected]> > > >> wrote: > > >> > > > > > > >> > > > >> Hi,Snehasish > > >> > > > >> > > >> > > > >> In your scenario, if you kill n3, which is acting as a > > follower, > > >> the > > >> > > > >> cluster will have 3 non-listener and 1 listener, with one > > >> follower > > >> > > already > > >> > > > >> offline. At this point, the majority situation becomes quite > > >> risky > > >> > > because > > >> > > > >> if any non-listener goes down from here, the Raft group will > > not > > >> be > > >> > > able to > > >> > > > >> form a quorum and elect a new leader. > > >> > > > >> > > >> > > > >> Although you have promoted n4 to a listener and removed n3, > > >> before > > >> > > this > > >> > > > >> request completes, the majority of the Raft group is still 2. > > >> > > Therefore, > > >> > > > >> after you kill n1, a new leader cannot be elected. In my > > >> > > understanding, > > >> > > > >> this phenomenon is not a bug and aligns with the expected > > >> behavior of > > >> > > the > > >> > > > >> algorithm. > > >> > > > >> > > >> > > > >> If you want to test how to safely promote a listener to a > > >> follower, > > >> > > make > > >> > > > >> sure that before the promotion request completes (you can > > confirm > > >> > > this with > > >> > > > >> shell commands as suggested by sze), the current leader and > > >> follower > > >> > > > >> members maintain the majority online. Otherwise, the > promotion > > >> action > > >> > > will > > >> > > > >> not be successful, and this is not a problem with the > > >> implementation > > >> > > but a > > >> > > > >> boundary of the Raft algorithm. > > >> > > > >> > > >> > > > >> Feel free to do more testing on this feature of Ratis. If you > > >> > > encounter > > >> > > > >> the following issues, it would indicate that there is indeed > a > > >> > > problem with > > >> > > > >> the implementation, and we welcome discussions and > > contributions: > > >> > > > >> * You find that even with the majority of leader and follower > > >> members > > >> > > > >> online, you still cannot successfully promote a listener to a > > >> > > follower. > > >> > > > >> * In your case, because the majority was not maintained, the > > >> member > > >> > > > >> change failed. But after you restart n1 or n3 and > re-establish > > >> the > > >> > > > >> majority, the Raft group still cannot elect a leader or > elects > > a > > >> > > leader but > > >> > > > >> fails to perform member changes. > > >> > > > >> > > >> > > > >> We look forward to your testing. > > >> > > > >> > > >> > > > >> Best > > >> > > > >> -------------- > > >> > > > >> Xinyu Tan > > >> > > > >> > > >> > > > >> > > >> > > > >> On 2025/12/29 10:53:40 Snehasish Roy wrote: > > >> > > > >> > Hello everyone, > > >> > > > >> > > > >> > > > >> > Happy Holidays. This is my first email to this community so > > >> kindly > > >> > > > >> excuse > > >> > > > >> > me for any mistakes. > > >> > > > >> > > > >> > > > >> > I initially started a 3 node Ratis Cluster and then added a > > >> > > listener in > > >> > > > >> the > > >> > > > >> > Cluster using the setConfiguration(List.of(n1,n2,n3), > > >> List.of(n4)) > > >> > > > >> based on > > >> > > > >> > the following documentation > > >> > > > >> > > > >> > > > > >> > https://jojochuang.github.io/ratis-site/docs/developer-guide/listeners > > >> > > > >> > > > >> > > > >> > ``` > > >> > > > >> > INFO [2025-12-29 15:57:01,887] [n1-server-thread1] > > >> > > > >> [RaftServer$Division]: > > >> > > > >> > n1@group-ABB3109A44C2-LeaderStateImpl: > startSetConfiguration > > >> > > > >> > > > >> SetConfigurationRequest:client-044D31187FB4->n1@group-ABB3109A44C2, > > >> > > > >> cid=3, > > >> > > > >> > seq=null, RW, null, SET_UNCONDITIONALLY, servers:[n1| > > >> 0.0.0.0:9000, > > >> > > n2| > > >> > > > >> > 0.0.0.0:9001, n3|0.0.0.0:9002], listeners:[n4|0.0.0.0:9003 > ] > > >> > > > >> > ``` > > >> > > > >> > > > >> > > > >> > Then I killed one of the Ratis follower node (n3) followed > by > > >> > > promoting > > >> > > > >> the > > >> > > > >> > listener to the follower using > > >> setConfiguration(List.of(n1,n2,n4)) > > >> > > > >> command > > >> > > > >> > to maintain the cluster size of 3. > > >> > > > >> > Please note that n3 has been removed from the list of > > >> followers and > > >> > > > >> there > > >> > > > >> > are no more listeners in the cluster and there were no > > failures > > >> > > observed > > >> > > > >> > while issuing the command. > > >> > > > >> > > > >> > > > >> > ``` > > >> > > > >> > INFO [2025-12-29 16:02:54,227] [n1-server-thread2] > > >> > > > >> [RaftServer$Division]: > > >> > > > >> > n1@group-ABB3109A44C2-LeaderStateImpl: > startSetConfiguration > > >> > > > >> > > > >> SetConfigurationRequest:client-2438CA24E2F3->n1@group-ABB3109A44C2, > > >> > > > >> cid=4, > > >> > > > >> > seq=null, RW, null, SET_UNCONDITIONALLY, servers:[n1| > > >> 0.0.0.0:9000, > > >> > > n2| > > >> > > > >> > 0.0.0.0:9001, n4|0.0.0.0:9003], listeners:[] > > >> > > > >> > ``` > > >> > > > >> > > > >> > > > >> > Then I killed the leader instance n1. Post which n2 > attempted > > >> to > > >> > > become > > >> > > > >> a > > >> > > > >> > leader and starts asking for votes from n1 and n4. There is > > no > > >> > > response > > >> > > > >> > from n1 as it's not alive and n4 is rejecting the pre_vote > > >> request > > >> > > from > > >> > > > >> n2 > > >> > > > >> > because it still thinks it's a listener. > > >> > > > >> > > > >> > > > >> > Logs from n2 > > >> > > > >> > ``` > > >> > > > >> > INFO [2025-12-29 16:04:10,051] > > >> > > [n2@group-ABB3109A44C2-LeaderElection30 > > >> > > > >> ] > > >> > > > >> > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection30 > > >> PRE_VOTE > > >> > > > >> round 0: > > >> > > > >> > submit vote requests at term 1 for conf: {index: 15, > > >> cur=peers:[n1| > > >> > > > >> > 0.0.0.0:9000, n2|0.0.0.0:9001, n4|0.0.0.0:9003 > > ]|listeners:[], > > >> > > old=null} > > >> > > > >> > INFO [2025-12-29 16:04:10,052] > > >> > > [n2@group-ABB3109A44C2-LeaderElection30 > > >> > > > >> ] > > >> > > > >> > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection30 > got > > >> > > exception > > >> > > > >> when > > >> > > > >> > requesting votes: java.util.concurrent.ExecutionException: > > >> > > > >> > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: > > >> > > > >> UNAVAILABLE: io > > >> > > > >> > exception > > >> > > > >> > INFO [2025-12-29 16:04:10,054] > > >> > > [n2@group-ABB3109A44C2-LeaderElection30 > > >> > > > >> ] > > >> > > > >> > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection30: > > >> PRE_VOTE > > >> > > > >> REJECTED > > >> > > > >> > received 1 response(s) and 1 exception(s): > > >> > > > >> > INFO [2025-12-29 16:04:10,054] > > >> > > [n2@group-ABB3109A44C2-LeaderElection30 > > >> > > > >> ] > > >> > > > >> > [LeaderElection]: Response 0: n2<-n4#0:FAIL-t1-last:(t:1, > > >> i:16) > > >> > > > >> > INFO [2025-12-29 16:04:10,054] > > >> > > [n2@group-ABB3109A44C2-LeaderElection30 > > >> > > > >> ] > > >> > > > >> > [LeaderElection]: Exception 1: > > >> > > > >> java.util.concurrent.ExecutionException: > > >> > > > >> > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: > > >> > > > >> UNAVAILABLE: io > > >> > > > >> > exception > > >> > > > >> > ``` > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > Due to lack of leader, the cluster is no more stable. > > >> > > > >> > > > >> > > > >> > Logs from n4 > > >> > > > >> > ``` > > >> > > > >> > INFO [2025-12-29 16:05:03,405] [grpc-default-executor-2] > > >> > > > >> > [RaftServer$Division]: n4@group-ABB3109A44C2: receive > > >> > > > >> requestVote(PRE_VOTE, > > >> > > > >> > n2, group-ABB3109A44C2, 1, (t:1, i:16)) > > >> > > > >> > INFO [2025-12-29 16:05:03,405] [grpc-default-executor-2] > > >> > > [VoteContext]: > > >> > > > >> > n4@group-ABB3109A44C2-LISTENER: reject PRE_VOTE from n2: > > this > > >> > > server > > >> > > > >> is a > > >> > > > >> > listener, who is a non-voting member > > >> > > > >> > INFO [2025-12-29 16:05:03,405] [grpc-default-executor-2] > > >> > > > >> > [RaftServer$Division]: n4@group-ABB3109A44C2 replies to > > >> PRE_VOTE > > >> > > vote > > >> > > > >> > request: n2<-n4#0:FAIL-t1-last:(t:1, i:16). Peer's state: > > >> > > > >> > n4@group-ABB3109A44C2:t1, leader=n1, voted=null, > > >> > > > >> > raftlog=Memoized:n4@group-ABB3109A44C2-SegmentedRaftLog > > >> > > > >> :OPENED:c16:last(t:1, > > >> > > > >> > i:16), conf=conf: {index: 15, cur=peers:[n1|0.0.0.0:9000, > > n2| > > >> > > > >> 0.0.0.0:9001, > > >> > > > >> > n4|0.0.0.0:9003]|listeners:[], old=null} > > >> > > > >> > ``` > > >> > > > >> > > > >> > > > >> > So my question is how to correctly promote a listener to a > > >> follower? > > >> > > > >> Did I > > >> > > > >> > miss some step? Or is there a bug in the code? If it's the > > >> latter, I > > >> > > > >> would > > >> > > > >> > be happy to contribute. Please let me know if you need any > > more > > >> > > > >> debugging > > >> > > > >> > information. > > >> > > > >> > > > >> > > > >> > Thank you again for looking into this issue. > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > Regards, > > >> > > > >> > Snehasish > > >> > > > >> > > > >> > > > >> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > >
