Hello,

Good news everyone! This issue is no longer reproducible by using
*3.3.0-SNAPSHOT
*version.
Based on the merged PR's, I think https://github.com/apache/ratis/pull/1331
solved the issue of listener role transition.

Thank you for looking into this.


Regards,
Snehasish


On Sat, 17 Jan 2026 at 01:29, Tsz Wo Sze <[email protected]> wrote:

> > ... : n4@group-ABB3109A44C2 replies to PRE_VOTE vote request:
> n2<-n4#0:FAIL-t1-last:(t:1, i:16). Peer's state: n4@group-ABB3109A44C2:t1,
> leader=n1, voted=null,
> raftlog=Memoized:n4@group-ABB3109A44C2-SegmentedRaftLog
> :OPENED:c16:last(t:1,
> i:16), conf=conf: {index: 15, cur=peers:[n1|0.0.0.0:9000, n2|0.0.0.0:9001,
> n4|0.0.0.0:9003]|listeners:[], old=null}
> > ... : n4@group-ABB3109A44C2: receive requestVote(PRE_VOTE, n2,
> group-ABB3109A44C2, 1, (t:1, i:16))
> > ... : n4@group-ABB3109A44C2-LISTENER: reject PRE_VOTE from n2: this
> server is a listener, who is a non-voting member
>
> According to the above log, you actually removed n3.  Somehow n4 rejected
> the voteRequest from n2 and said that it was a non-voting member.
>
> It seems like a bug.  Could you share all the commands you have run?  I
> could try reproducing it.
>
> Thanks.
> Tsz-Wo
>
>
>
> On Fri, Jan 16, 2026 at 11:47 AM Tsz Wo Sze <[email protected]> wrote:
>
> > Hi Snehasish,
> >
> > Thanks for providing the details!
> >
> > > 3. After killing n3 and promoting n4 as follower
> > > ...
> > > 4. After killing n1 (leader) instance
> >
> > As Xinyu mentioned, you probably have changed the voting member size from
> > 3 to 4.  So, killing two servers, n3 and n1, makes it impossible to
> elect a
> > new leader since a majority requires 3 voting members.
> >
> > I guess you were setConf from [follower: n1,n2,n3; listener: n4]
> > - to [follower: n1,n2,n3,n4]
> >
> > In order to keep voting member size 3, I suggest you to setConf
> > - to [follower: n1,n2,n4; Listener n3]
> > (or removing n3, i.e. setConf to [follower: n1,n2, n4])
> >
> > Hope it helps.
> > Tsz-Wo
> >
> >
> > On Thu, Jan 15, 2026 at 9:44 PM Xinyu Tan <[email protected]> wrote:
> >
> >> Hi,
> >>
> >> Okay, I understand your question!
> >>
> >> Congratulations, you may have found a potential bug.
> >>
> >> Next, you can further investigate Ratis's leader election mechanism and
> >> see why the other two followers refused to vote!
> >>
> >> You maybe see some code according to the logs such as[1][2].
> >>
> >> [1]
> >>
> https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1428
> >> [2]
> >>
> https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderElection.java#L442-L481
> >>
> >> Best
> >> --------------
> >> Xinyu Tan
> >>
> >> On 2026/01/16 05:16:04 Snehasish Roy wrote:
> >> > Hello,
> >> >
> >> > Thank you for your prompt response.
> >> >
> >> > > You only killed n3 instead of removing it from the cluster, and n1
> >> and n2
> >> > formed the quorum.
> >> > I did remove n3 from the cluster before promoting n4 to the follower.
> >> This
> >> > was successful because n1 and n2 were still online.
> >> >
> >> > This is evident from the below info which shows n1, n2 and n4 are in
> the
> >> > cluster.
> >> >
> >> > ```
> >> > ❯ ./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001,0.0.0.0:9002
> ,
> >> > 0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2
> >> > [main] WARN org.apache.ratis.metrics.MetricRegistriesLoader - Found
> >> > multiple MetricRegistries: [class
> >> > org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
> >> > org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl]. Using
> the
> >> > first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl
> >> > group id: 02511d47-d67c-49a3-9011-abb3109a44c2
> >> > leader info: n1(0.0.0.0:9000)
> >> >
> >> > [server {
> >> >   id: "n1"
> >> >   address: "0.0.0.0:9000"
> >> >   startupRole: FOLLOWER
> >> > }
> >> > commitIndex: 16
> >> > , server {
> >> >   id: "n2"
> >> >   address: "0.0.0.0:9001"
> >> >   startupRole: FOLLOWER
> >> > }
> >> > commitIndex: 16
> >> > , server {
> >> >   id: "n4"
> >> >   address: "0.0.0.0:9003"
> >> >   startupRole: FOLLOWER
> >> > }
> >> > commitIndex: 16
> >> > ]
> >> > applied {
> >> >   term: 1
> >> >   index: 16
> >> > }
> >> > committed {
> >> >   term: 1
> >> >   index: 16
> >> > }
> >> > lastEntry {
> >> >   term: 1
> >> >   index: 16
> >> > }
> >> > ```
> >> >
> >> > After killing n1, the logs from n2 also list the configuration which
> >> > clearly shows the peers as n1, n2 and n4.
> >> > Logs also demonstrates that n2 is asking for votes from n1 and n4 -
> not
> >> > from n3, indicating the cluster is only a 3 node cluster.
> >> >
> >> > ```
> >> > INFO  [2026-01-15 17:48:03,347]
> [n2@group-ABB3109A44C2-LeaderElection176
> >> ]
> >> > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection176 PRE_VOTE
> >> round 0:
> >> > submit vote requests at term 1 for conf: {index: 15, cur=peers:[n1|
> >> > 0.0.0.0:9000, n2|0.0.0.0:9001, n4|0.0.0.0:9003]|listeners:[],
> old=null}
> >> > ```
> >> >
> >> > Please let me know if I misunderstood something.
> >> >
> >> >
> >> >
> >> > Regards,
> >> > Snehasish
> >> >
> >> > On Fri, 16 Jan 2026 at 08:49, Xinyu Tan <[email protected]> wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > In your scenario, there are two phenomena:
> >> > >
> >> > > Phenomenon 1
> >> > > You only killed n3 instead of removing it from the cluster, and n1
> >> and n2
> >> > > formed the quorum. As a result, compared to the last test, you were
> >> able to
> >> > > successfully promote n4 from listener to follower, which is as
> >> expected
> >> > > because, during the member change, the original quorum of n1, n2,
> and
> >> n3,
> >> > > i.e., n1 and n2, were still online.
> >> > >
> >> > > Phenomenon 2
> >> > > It is important to note that once you promote n4 to a follower, the
> >> group
> >> > > members become n1, n2, n3, and n4. The quorum is now 3 instead of 2,
> >> and
> >> > > since n3 has already been killed, killing n1 at this point would
> >> cause the
> >> > > consensus group to fail to form a quorum of 3 members, making it
> >> impossible
> >> > > to elect a new leader. If you wish to perform this action, you can
> try
> >> > > removing the killed n3 from the group first. This way, the consensus
> >> group
> >> > > will only consist of n1, n2, and n4, and the quorum will be 2. At
> this
> >> > > point, killing n1 should allow the election of a new leader, as the
> >> quorum
> >> > > of 2 members is still online.
> >> > >
> >> > > Your test scenario involves the most complex part of the consensus
> >> > > algorithm—member changes. I think you should take a closer look at
> >> the PhD
> >> > > thesis of the Raft authors[1], which is more detailed than the
> ATC2014
> >> > > conference version. I believe that after reading it, you will have a
> >> deeper
> >> > > understanding of the Raft algorithm!
> >> > >
> >> > > Looking forward to your next test.
> >> > >
> >> > > [1]
> https://github.com/ongardie/dissertation/blob/master/stanford.pdf
> >> > >
> >> > > Best
> >> > > -----------------
> >> > > Xinyu Tan
> >> > >
> >> > > On 2026/01/15 15:20:24 Snehasish Roy wrote:
> >> > > > Hello,
> >> > > >
> >> > > > Based on your inputs, I was able to reproduce the issue
> >> consistently.
> >> > > >
> >> > > > 1. After starting n1, n2 and n3 nodes
> >> > > >
> >> > > > ```
> >> > > > ./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001,
> 0.0.0.0:9002
> >> ,
> >> > > > 0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2
> >> > > > [main] WARN org.apache.ratis.metrics.MetricRegistriesLoader -
> Found
> >> > > > multiple MetricRegistries: [class
> >> > > > org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
> >> > > > org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl].
> >> Using the
> >> > > > first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl
> >> > > > group id: 02511d47-d67c-49a3-9011-abb3109a44c2
> >> > > > leader info: n1(0.0.0.0:9000)
> >> > > >
> >> > > > [server {
> >> > > >   id: "n1"
> >> > > >   address: "0.0.0.0:9000"
> >> > > >   startupRole: FOLLOWER
> >> > > > }
> >> > > > commitIndex: 8
> >> > > > , server {
> >> > > >   id: "n2"
> >> > > >   address: "0.0.0.0:9001"
> >> > > >   startupRole: FOLLOWER
> >> > > > }
> >> > > > commitIndex: 8
> >> > > > , server {
> >> > > >   id: "n3"
> >> > > >   address: "0.0.0.0:9002"
> >> > > >   startupRole: FOLLOWER
> >> > > > }
> >> > > > commitIndex: 8
> >> > > > ]
> >> > > > applied {
> >> > > >   term: 1
> >> > > >   index: 8
> >> > > > }
> >> > > > committed {
> >> > > >   term: 1
> >> > > >   index: 8
> >> > > > }
> >> > > > lastEntry {
> >> > > >   term: 1
> >> > > >   index: 8
> >> > > > }
> >> > > > ```
> >> > > >
> >> > > > 2. After adding n4 as listener
> >> > > >
> >> > > > ```
> >> > > > ./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001,
> 0.0.0.0:9002
> >> ,
> >> > > > 0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2
> >> > > > [main] WARN org.apache.ratis.metrics.MetricRegistriesLoader -
> Found
> >> > > > multiple MetricRegistries: [class
> >> > > > org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
> >> > > > org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl].
> >> Using the
> >> > > > first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl
> >> > > > group id: 02511d47-d67c-49a3-9011-abb3109a44c2
> >> > > > leader info: n1(0.0.0.0:9000)
> >> > > >
> >> > > > [server {
> >> > > >   id: "n1"
> >> > > >   address: "0.0.0.0:9000"
> >> > > >   startupRole: FOLLOWER
> >> > > > }
> >> > > > commitIndex: 12
> >> > > > , server {
> >> > > >   id: "n2"
> >> > > >   address: "0.0.0.0:9001"
> >> > > >   startupRole: FOLLOWER
> >> > > > }
> >> > > > commitIndex: 12
> >> > > > , server {
> >> > > >   id: "n3"
> >> > > >   address: "0.0.0.0:9002"
> >> > > >   startupRole: FOLLOWER
> >> > > > }
> >> > > > commitIndex: 12
> >> > > > , server {
> >> > > >   id: "n4"
> >> > > >   address: "0.0.0.0:9003"
> >> > > >   startupRole: LISTENER
> >> > > > }
> >> > > > commitIndex: 12
> >> > > > ]
> >> > > > applied {
> >> > > >   term: 1
> >> > > >   index: 12
> >> > > > }
> >> > > > committed {
> >> > > >   term: 1
> >> > > >   index: 12
> >> > > > }
> >> > > > lastEntry {
> >> > > >   term: 1
> >> > > >   index: 12
> >> > > > }
> >> > > > ```
> >> > > >
> >> > > > 3. After killing n3 and promoting n4 as follower
> >> > > >
> >> > > > ```
> >> > > > ❯ ./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001,
> >> 0.0.0.0:9002,
> >> > > > 0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2
> >> > > > [main] WARN org.apache.ratis.metrics.MetricRegistriesLoader -
> Found
> >> > > > multiple MetricRegistries: [class
> >> > > > org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
> >> > > > org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl].
> >> Using the
> >> > > > first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl
> >> > > > group id: 02511d47-d67c-49a3-9011-abb3109a44c2
> >> > > > leader info: n1(0.0.0.0:9000)
> >> > > >
> >> > > > [server {
> >> > > >   id: "n1"
> >> > > >   address: "0.0.0.0:9000"
> >> > > >   startupRole: FOLLOWER
> >> > > > }
> >> > > > commitIndex: 16
> >> > > > , server {
> >> > > >   id: "n2"
> >> > > >   address: "0.0.0.0:9001"
> >> > > >   startupRole: FOLLOWER
> >> > > > }
> >> > > > commitIndex: 16
> >> > > > , server {
> >> > > >   id: "n4"
> >> > > >   address: "0.0.0.0:9003"
> >> > > >   startupRole: FOLLOWER
> >> > > > }
> >> > > > commitIndex: 16
> >> > > > ]
> >> > > > applied {
> >> > > >   term: 1
> >> > > >   index: 16
> >> > > > }
> >> > > > committed {
> >> > > >   term: 1
> >> > > >   index: 16
> >> > > > }
> >> > > > lastEntry {
> >> > > >   term: 1
> >> > > >   index: 16
> >> > > > }
> >> > > > ```
> >> > > >
> >> > > > 4. After killing n1 (leader) instance
> >> > > >
> >> > > > ```
> >> > > > ❯ ./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001,
> >> 0.0.0.0:9002,
> >> > > > 0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2
> >> > > > [main] WARN org.apache.ratis.metrics.MetricRegistriesLoader -
> Found
> >> > > > multiple MetricRegistries: [class
> >> > > > org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
> >> > > > org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl].
> >> Using the
> >> > > > first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl
> >> > > > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> >> UNAVAILABLE:
> >> > > io
> >> > > > exception
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:368)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:349)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:174)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.proto.grpc.AdminProtocolServiceGrpc$AdminProtocolServiceBlockingStub.groupList(AdminProtocolServiceGrpc.java:573)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupList(GrpcClientProtocolClient.java:167)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:106)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.client.impl.BlockingImpl.sendRequest(BlockingImpl.java:147)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.client.impl.BlockingImpl.sendRequestWithRetry(BlockingImpl.java:109)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.client.impl.GroupManagementImpl.list(GroupManagementImpl.java:69)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.shell.cli.CliUtils.lambda$getGroupId$1(CliUtils.java:118)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.shell.cli.CliUtils.applyFunctionReturnFirstNonNull(CliUtils.java:72)
> >> > > > at
> org.apache.ratis.shell.cli.CliUtils.getGroupId(CliUtils.java:117)
> >> > > > at
> >> > > > org.apache.ratis.shell.cli.sh
> >> > > .command.AbstractRatisCommand.run(AbstractRatisCommand.java:70)
> >> > > > at
> >> > > > org.apache.ratis.shell.cli.sh
> >> > > .group.GroupInfoCommand.run(GroupInfoCommand.java:47)
> >> > > > at
> >> org.apache.ratis.shell.cli.AbstractShell.run(AbstractShell.java:104)
> >> > > > at org.apache.ratis.shell.cli.sh
> >> .RatisShell.main(RatisShell.java:62)
> >> > > > Caused by:
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
> >> > > > Connection refused: /0.0.0.0:9000
> >> > > > Caused by: java.net.ConnectException: Connection refused
> >> > > > at java.base/sun.nio.ch.Net.pollConnect(Native Method)
> >> > > > at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
> >> > > > at
> >> > > > java.base/sun.nio.ch
> >> > > .SocketChannelImpl.finishConnect(SocketChannelImpl.java:946)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:336)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:339)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:784)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:732)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:658)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> >> > > > at java.base/java.lang.Thread.run(Thread.java:833)
> >> > > > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> >> UNAVAILABLE:
> >> > > io
> >> > > > exception
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:368)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:349)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:174)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.proto.grpc.AdminProtocolServiceGrpc$AdminProtocolServiceBlockingStub.groupInfo(AdminProtocolServiceGrpc.java:580)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupInfo(GrpcClientProtocolClient.java:173)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:110)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.client.impl.BlockingImpl.sendRequest(BlockingImpl.java:147)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.client.impl.BlockingImpl.sendRequestWithRetry(BlockingImpl.java:109)
> >> > > > at org.apache.ratis.client.impl.GroupManagementImpl.info
> >> > > > (GroupManagementImpl.java:79)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.shell.cli.CliUtils.lambda$getGroupInfo$2(CliUtils.java:146)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.shell.cli.CliUtils.applyFunctionReturnFirstNonNull(CliUtils.java:72)
> >> > > > at
> >> org.apache.ratis.shell.cli.CliUtils.getGroupInfo(CliUtils.java:145)
> >> > > > at
> >> > > > org.apache.ratis.shell.cli.sh
> >> > > .command.AbstractRatisCommand.run(AbstractRatisCommand.java:71)
> >> > > > at
> >> > > > org.apache.ratis.shell.cli.sh
> >> > > .group.GroupInfoCommand.run(GroupInfoCommand.java:47)
> >> > > > at
> >> org.apache.ratis.shell.cli.AbstractShell.run(AbstractShell.java:104)
> >> > > > at org.apache.ratis.shell.cli.sh
> >> .RatisShell.main(RatisShell.java:62)
> >> > > > Caused by:
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
> >> > > > Connection refused: /0.0.0.0:9000
> >> > > > Caused by: java.net.ConnectException: Connection refused
> >> > > > at java.base/sun.nio.ch.Net.pollConnect(Native Method)
> >> > > > at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
> >> > > > at
> >> > > > java.base/sun.nio.ch
> >> > > .SocketChannelImpl.finishConnect(SocketChannelImpl.java:946)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:336)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:339)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:784)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:732)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:658)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> >> > > > at
> >> > > >
> >> > >
> >>
> org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> >> > > > at java.base/java.lang.Thread.run(Thread.java:833)
> >> > > > group id: 02511d47-d67c-49a3-9011-abb3109a44c2
> >> > > > leader info: ()
> >> > > >
> >> > > > [server {
> >> > > >   id: "n2"
> >> > > >   address: "0.0.0.0:9001"
> >> > > >   startupRole: FOLLOWER
> >> > > > }
> >> > > > commitIndex: 16
> >> > > > , server {
> >> > > >   id: "n1"
> >> > > >   address: "0.0.0.0:9000"
> >> > > >   startupRole: FOLLOWER
> >> > > > }
> >> > > > commitIndex: 16
> >> > > > , server {
> >> > > >   id: "n4"
> >> > > >   address: "0.0.0.0:9003"
> >> > > >   startupRole: FOLLOWER
> >> > > > }
> >> > > > commitIndex: 16
> >> > > > ]
> >> > > > applied {
> >> > > >   term: 1
> >> > > >   index: 16
> >> > > > }
> >> > > > committed {
> >> > > >   term: 1
> >> > > >   index: 16
> >> > > > }
> >> > > > lastEntry {
> >> > > >   term: 1
> >> > > >   index: 16
> >> > > > }
> >> > > > ```
> >> > > >
> >> > > > Logs from n4
> >> > > > ```
> >> > > > INFO  [2026-01-15 17:48:06,696] [grpc-default-executor-2]
> >> > > > [RaftServer$Division]: n4@group-ABB3109A44C2 replies to PRE_VOTE
> >> vote
> >> > > > request: n2<-n4#0:FAIL-t1-last:(t:1, i:16). Peer's state:
> >> > > > n4@group-ABB3109A44C2:t1, leader=n1, voted=null,
> >> > > > raftlog=Memoized:n4@group-ABB3109A44C2-SegmentedRaftLog
> >> > > :OPENED:c16:last(t:1,
> >> > > > i:16), conf=conf: {index: 15, cur=peers:[n1|0.0.0.0:9000, n2|
> >> > > 0.0.0.0:9001,
> >> > > > n4|0.0.0.0:9003]|listeners:[], old=null}
> >> > > > INFO  [2026-01-15 17:48:06,897] [grpc-default-executor-2]
> >> > > > [RaftServer$Division]: n4@group-ABB3109A44C2: receive
> >> > > requestVote(PRE_VOTE,
> >> > > > n2, group-ABB3109A44C2, 1, (t:1, i:16))
> >> > > > INFO  [2026-01-15 17:48:06,897] [grpc-default-executor-2]
> >> [VoteContext]:
> >> > > > n4@group-ABB3109A44C2-LISTENER: reject PRE_VOTE from n2: this
> >> server is
> >> > > a
> >> > > > listener, who is a non-voting member
> >> > > > ```
> >> > > >
> >> > > >
> >> > > > Logs from n2
> >> > > >
> >> > > > ```
> >> > > > INFO  [2026-01-15 17:48:03,347]
> >> [n2@group-ABB3109A44C2-LeaderElection176
> >> > > ]
> >> > > > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection176
> PRE_VOTE
> >> > > round 0:
> >> > > > submit vote requests at term 1 for conf: {index: 15,
> cur=peers:[n1|
> >> > > > 0.0.0.0:9000, n2|0.0.0.0:9001, n4|0.0.0.0:9003]|listeners:[],
> >> old=null}
> >> > > > INFO  [2026-01-15 17:48:03,348]
> >> [n2@group-ABB3109A44C2-LeaderElection176
> >> > > ]
> >> > > > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection176 got
> >> exception
> >> > > > when requesting votes: java.util.concurrent.ExecutionException:
> >> > > > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> >> UNAVAILABLE:
> >> > > io
> >> > > > exception
> >> > > > INFO  [2026-01-15 17:48:03,352]
> >> [n2@group-ABB3109A44C2-LeaderElection176
> >> > > ]
> >> > > > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection176:
> PRE_VOTE
> >> > > > REJECTED received 1 response(s) and 1 exception(s):
> >> > > > INFO  [2026-01-15 17:48:03,352]
> >> [n2@group-ABB3109A44C2-LeaderElection176
> >> > > ]
> >> > > > [LeaderElection]:   Response 0: n2<-n4#0:FAIL-t1-last:(t:1, i:16)
> >> > > > INFO  [2026-01-15 17:48:03,352]
> >> [n2@group-ABB3109A44C2-LeaderElection176
> >> > > ]
> >> > > > [LeaderElection]:   Exception 1:
> >> java.util.concurrent.ExecutionException:
> >> > > > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> >> UNAVAILABLE:
> >> > > io
> >> > > > exception
> >> > > > ```
> >> > > >
> >> > > > This indicates that the cluster is in an unstable state.
> >> > > > I am willing to contribute, could you guide me a bit on this?
> >> > > >
> >> > > >
> >> > > > Regards,
> >> > > > Snehasish
> >> > > >
> >> > > > On Wed, 14 Jan 2026 at 08:44, Snehasish Roy <
> >> [email protected]>
> >> > > > wrote:
> >> > > >
> >> > > > > Hello,
> >> > > > >
> >> > > > >
> >> > > > > Thank you for your inputs. I will check and update this thread.
> >> > > > >
> >> > > > >
> >> > > > > Regards,
> >> > > > > Snehasish
> >> > > > >
> >> > > > > On Wed, 7 Jan, 2026, 8:52 am Xinyu Tan, <[email protected]>
> >> wrote:
> >> > > > >
> >> > > > >> Hi,Snehasish
> >> > > > >>
> >> > > > >> In your scenario, if you kill n3, which is acting as a
> follower,
> >> the
> >> > > > >> cluster will have 3 non-listener and 1 listener, with one
> >> follower
> >> > > already
> >> > > > >> offline. At this point, the majority situation becomes quite
> >> risky
> >> > > because
> >> > > > >> if any non-listener goes down from here, the Raft group will
> not
> >> be
> >> > > able to
> >> > > > >> form a quorum and elect a new leader.
> >> > > > >>
> >> > > > >> Although you have promoted n4 to a listener and removed n3,
> >> before
> >> > > this
> >> > > > >> request completes, the majority of the Raft group is still 2.
> >> > > Therefore,
> >> > > > >> after you kill n1, a new leader cannot be elected. In my
> >> > > understanding,
> >> > > > >> this phenomenon is not a bug and aligns with the expected
> >> behavior of
> >> > > the
> >> > > > >> algorithm.
> >> > > > >>
> >> > > > >> If you want to test how to safely promote a listener to a
> >> follower,
> >> > > make
> >> > > > >> sure that before the promotion request completes (you can
> confirm
> >> > > this with
> >> > > > >> shell commands as suggested by sze), the current leader and
> >> follower
> >> > > > >> members maintain the majority online. Otherwise, the promotion
> >> action
> >> > > will
> >> > > > >> not be successful, and this is not a problem with the
> >> implementation
> >> > > but a
> >> > > > >> boundary of the Raft algorithm.
> >> > > > >>
> >> > > > >> Feel free to do more testing on this feature of Ratis. If you
> >> > > encounter
> >> > > > >> the following issues, it would indicate that there is indeed a
> >> > > problem with
> >> > > > >> the implementation, and we welcome discussions and
> contributions:
> >> > > > >> * You find that even with the majority of leader and follower
> >> members
> >> > > > >> online, you still cannot successfully promote a listener to a
> >> > > follower.
> >> > > > >> * In your case, because the majority was not maintained, the
> >> member
> >> > > > >> change failed. But after you restart n1 or n3 and re-establish
> >> the
> >> > > > >> majority, the Raft group still cannot elect a leader or elects
> a
> >> > > leader but
> >> > > > >> fails to perform member changes.
> >> > > > >>
> >> > > > >> We look forward to your testing.
> >> > > > >>
> >> > > > >> Best
> >> > > > >> --------------
> >> > > > >> Xinyu Tan
> >> > > > >>
> >> > > > >>
> >> > > > >> On 2025/12/29 10:53:40 Snehasish Roy wrote:
> >> > > > >> > Hello everyone,
> >> > > > >> >
> >> > > > >> > Happy Holidays. This is my first email to this community so
> >> kindly
> >> > > > >> excuse
> >> > > > >> > me for any mistakes.
> >> > > > >> >
> >> > > > >> > I initially started a 3 node Ratis Cluster and then added a
> >> > > listener in
> >> > > > >> the
> >> > > > >> > Cluster using the setConfiguration(List.of(n1,n2,n3),
> >> List.of(n4))
> >> > > > >> based on
> >> > > > >> > the following documentation
> >> > > > >> >
> >> > >
> >> https://jojochuang.github.io/ratis-site/docs/developer-guide/listeners
> >> > > > >> >
> >> > > > >> > ```
> >> > > > >> > INFO  [2025-12-29 15:57:01,887] [n1-server-thread1]
> >> > > > >> [RaftServer$Division]:
> >> > > > >> > n1@group-ABB3109A44C2-LeaderStateImpl: startSetConfiguration
> >> > > > >> >
> >> SetConfigurationRequest:client-044D31187FB4->n1@group-ABB3109A44C2,
> >> > > > >> cid=3,
> >> > > > >> > seq=null, RW, null, SET_UNCONDITIONALLY, servers:[n1|
> >> 0.0.0.0:9000,
> >> > > n2|
> >> > > > >> > 0.0.0.0:9001, n3|0.0.0.0:9002], listeners:[n4|0.0.0.0:9003]
> >> > > > >> > ```
> >> > > > >> >
> >> > > > >> > Then I killed one of the Ratis follower node (n3) followed by
> >> > > promoting
> >> > > > >> the
> >> > > > >> > listener to the follower using
> >> setConfiguration(List.of(n1,n2,n4))
> >> > > > >> command
> >> > > > >> > to maintain the cluster size of 3.
> >> > > > >> > Please note that n3 has been removed from the list of
> >> followers and
> >> > > > >> there
> >> > > > >> > are no more listeners in the cluster and there were no
> failures
> >> > > observed
> >> > > > >> > while issuing the command.
> >> > > > >> >
> >> > > > >> > ```
> >> > > > >> > INFO  [2025-12-29 16:02:54,227] [n1-server-thread2]
> >> > > > >> [RaftServer$Division]:
> >> > > > >> > n1@group-ABB3109A44C2-LeaderStateImpl: startSetConfiguration
> >> > > > >> >
> >> SetConfigurationRequest:client-2438CA24E2F3->n1@group-ABB3109A44C2,
> >> > > > >> cid=4,
> >> > > > >> > seq=null, RW, null, SET_UNCONDITIONALLY, servers:[n1|
> >> 0.0.0.0:9000,
> >> > > n2|
> >> > > > >> > 0.0.0.0:9001, n4|0.0.0.0:9003], listeners:[]
> >> > > > >> > ```
> >> > > > >> >
> >> > > > >> > Then I killed the leader instance n1. Post which n2 attempted
> >> to
> >> > > become
> >> > > > >> a
> >> > > > >> > leader and starts asking for votes from n1 and n4. There is
> no
> >> > > response
> >> > > > >> > from n1 as it's not alive and n4 is rejecting the pre_vote
> >> request
> >> > > from
> >> > > > >> n2
> >> > > > >> > because it still thinks it's a listener.
> >> > > > >> >
> >> > > > >> > Logs from n2
> >> > > > >> > ```
> >> > > > >> > INFO  [2025-12-29 16:04:10,051]
> >> > > [n2@group-ABB3109A44C2-LeaderElection30
> >> > > > >> ]
> >> > > > >> > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection30
> >> PRE_VOTE
> >> > > > >> round 0:
> >> > > > >> > submit vote requests at term 1 for conf: {index: 15,
> >> cur=peers:[n1|
> >> > > > >> > 0.0.0.0:9000, n2|0.0.0.0:9001, n4|0.0.0.0:9003
> ]|listeners:[],
> >> > > old=null}
> >> > > > >> > INFO  [2025-12-29 16:04:10,052]
> >> > > [n2@group-ABB3109A44C2-LeaderElection30
> >> > > > >> ]
> >> > > > >> > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection30 got
> >> > > exception
> >> > > > >> when
> >> > > > >> > requesting votes: java.util.concurrent.ExecutionException:
> >> > > > >> > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> >> > > > >> UNAVAILABLE: io
> >> > > > >> > exception
> >> > > > >> > INFO  [2025-12-29 16:04:10,054]
> >> > > [n2@group-ABB3109A44C2-LeaderElection30
> >> > > > >> ]
> >> > > > >> > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection30:
> >> PRE_VOTE
> >> > > > >> REJECTED
> >> > > > >> > received 1 response(s) and 1 exception(s):
> >> > > > >> > INFO  [2025-12-29 16:04:10,054]
> >> > > [n2@group-ABB3109A44C2-LeaderElection30
> >> > > > >> ]
> >> > > > >> > [LeaderElection]:   Response 0: n2<-n4#0:FAIL-t1-last:(t:1,
> >> i:16)
> >> > > > >> > INFO  [2025-12-29 16:04:10,054]
> >> > > [n2@group-ABB3109A44C2-LeaderElection30
> >> > > > >> ]
> >> > > > >> > [LeaderElection]:   Exception 1:
> >> > > > >> java.util.concurrent.ExecutionException:
> >> > > > >> > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> >> > > > >> UNAVAILABLE: io
> >> > > > >> > exception
> >> > > > >> > ```
> >> > > > >> >
> >> > > > >> >
> >> > > > >> > Due to lack of leader, the cluster is no more stable.
> >> > > > >> >
> >> > > > >> > Logs from n4
> >> > > > >> > ```
> >> > > > >> > INFO  [2025-12-29 16:05:03,405] [grpc-default-executor-2]
> >> > > > >> > [RaftServer$Division]: n4@group-ABB3109A44C2: receive
> >> > > > >> requestVote(PRE_VOTE,
> >> > > > >> > n2, group-ABB3109A44C2, 1, (t:1, i:16))
> >> > > > >> > INFO  [2025-12-29 16:05:03,405] [grpc-default-executor-2]
> >> > > [VoteContext]:
> >> > > > >> > n4@group-ABB3109A44C2-LISTENER: reject PRE_VOTE from n2:
> this
> >> > > server
> >> > > > >> is a
> >> > > > >> > listener, who is a non-voting member
> >> > > > >> > INFO  [2025-12-29 16:05:03,405] [grpc-default-executor-2]
> >> > > > >> > [RaftServer$Division]: n4@group-ABB3109A44C2 replies to
> >> PRE_VOTE
> >> > > vote
> >> > > > >> > request: n2<-n4#0:FAIL-t1-last:(t:1, i:16). Peer's state:
> >> > > > >> > n4@group-ABB3109A44C2:t1, leader=n1, voted=null,
> >> > > > >> > raftlog=Memoized:n4@group-ABB3109A44C2-SegmentedRaftLog
> >> > > > >> :OPENED:c16:last(t:1,
> >> > > > >> > i:16), conf=conf: {index: 15, cur=peers:[n1|0.0.0.0:9000,
> n2|
> >> > > > >> 0.0.0.0:9001,
> >> > > > >> > n4|0.0.0.0:9003]|listeners:[], old=null}
> >> > > > >> > ```
> >> > > > >> >
> >> > > > >> > So my question is how to correctly promote a listener to a
> >> follower?
> >> > > > >> Did I
> >> > > > >> > miss some step? Or is there a bug in the code? If it's the
> >> latter, I
> >> > > > >> would
> >> > > > >> > be happy to contribute. Please let me know if you need any
> more
> >> > > > >> debugging
> >> > > > >> > information.
> >> > > > >> >
> >> > > > >> > Thank you again for looking into this issue.
> >> > > > >> >
> >> > > > >> >
> >> > > > >> > Regards,
> >> > > > >> > Snehasish
> >> > > > >> >
> >> > > > >>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Reply via email to