Hi,

Okay, I understand your question! 

Congratulations, you may have found a potential bug. 

Next, you can further investigate Ratis's leader election mechanism and see why 
the other two followers refused to vote!

You maybe see some code according to the logs such as[1][2].

[1] 
https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1428
[2] 
https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderElection.java#L442-L481

Best
--------------
Xinyu Tan

On 2026/01/16 05:16:04 Snehasish Roy wrote:
> Hello,
> 
> Thank you for your prompt response.
> 
> > You only killed n3 instead of removing it from the cluster, and n1 and n2
> formed the quorum.
> I did remove n3 from the cluster before promoting n4 to the follower. This
> was successful because n1 and n2 were still online.
> 
> This is evident from the below info which shows n1, n2 and n4 are in the
> cluster.
> 
> ```
> ❯ ./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001,0.0.0.0:9002,
> 0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2
> [main] WARN org.apache.ratis.metrics.MetricRegistriesLoader - Found
> multiple MetricRegistries: [class
> org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
> org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl]. Using the
> first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl
> group id: 02511d47-d67c-49a3-9011-abb3109a44c2
> leader info: n1(0.0.0.0:9000)
> 
> [server {
>   id: "n1"
>   address: "0.0.0.0:9000"
>   startupRole: FOLLOWER
> }
> commitIndex: 16
> , server {
>   id: "n2"
>   address: "0.0.0.0:9001"
>   startupRole: FOLLOWER
> }
> commitIndex: 16
> , server {
>   id: "n4"
>   address: "0.0.0.0:9003"
>   startupRole: FOLLOWER
> }
> commitIndex: 16
> ]
> applied {
>   term: 1
>   index: 16
> }
> committed {
>   term: 1
>   index: 16
> }
> lastEntry {
>   term: 1
>   index: 16
> }
> ```
> 
> After killing n1, the logs from n2 also list the configuration which
> clearly shows the peers as n1, n2 and n4.
> Logs also demonstrates that n2 is asking for votes from n1 and n4 - not
> from n3, indicating the cluster is only a 3 node cluster.
> 
> ```
> INFO  [2026-01-15 17:48:03,347] [n2@group-ABB3109A44C2-LeaderElection176]
> [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection176 PRE_VOTE round 0:
> submit vote requests at term 1 for conf: {index: 15, cur=peers:[n1|
> 0.0.0.0:9000, n2|0.0.0.0:9001, n4|0.0.0.0:9003]|listeners:[], old=null}
> ```
> 
> Please let me know if I misunderstood something.
> 
> 
> 
> Regards,
> Snehasish
> 
> On Fri, 16 Jan 2026 at 08:49, Xinyu Tan <[email protected]> wrote:
> 
> > Hi,
> >
> > In your scenario, there are two phenomena:
> >
> > Phenomenon 1
> > You only killed n3 instead of removing it from the cluster, and n1 and n2
> > formed the quorum. As a result, compared to the last test, you were able to
> > successfully promote n4 from listener to follower, which is as expected
> > because, during the member change, the original quorum of n1, n2, and n3,
> > i.e., n1 and n2, were still online.
> >
> > Phenomenon 2
> > It is important to note that once you promote n4 to a follower, the group
> > members become n1, n2, n3, and n4. The quorum is now 3 instead of 2, and
> > since n3 has already been killed, killing n1 at this point would cause the
> > consensus group to fail to form a quorum of 3 members, making it impossible
> > to elect a new leader. If you wish to perform this action, you can try
> > removing the killed n3 from the group first. This way, the consensus group
> > will only consist of n1, n2, and n4, and the quorum will be 2. At this
> > point, killing n1 should allow the election of a new leader, as the quorum
> > of 2 members is still online.
> >
> > Your test scenario involves the most complex part of the consensus
> > algorithm—member changes. I think you should take a closer look at the PhD
> > thesis of the Raft authors[1], which is more detailed than the ATC2014
> > conference version. I believe that after reading it, you will have a deeper
> > understanding of the Raft algorithm!
> >
> > Looking forward to your next test.
> >
> > [1] https://github.com/ongardie/dissertation/blob/master/stanford.pdf
> >
> > Best
> > -----------------
> > Xinyu Tan
> >
> > On 2026/01/15 15:20:24 Snehasish Roy wrote:
> > > Hello,
> > >
> > > Based on your inputs, I was able to reproduce the issue consistently.
> > >
> > > 1. After starting n1, n2 and n3 nodes
> > >
> > > ```
> > > ./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001,0.0.0.0:9002,
> > > 0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2
> > > [main] WARN org.apache.ratis.metrics.MetricRegistriesLoader - Found
> > > multiple MetricRegistries: [class
> > > org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
> > > org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl]. Using the
> > > first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl
> > > group id: 02511d47-d67c-49a3-9011-abb3109a44c2
> > > leader info: n1(0.0.0.0:9000)
> > >
> > > [server {
> > >   id: "n1"
> > >   address: "0.0.0.0:9000"
> > >   startupRole: FOLLOWER
> > > }
> > > commitIndex: 8
> > > , server {
> > >   id: "n2"
> > >   address: "0.0.0.0:9001"
> > >   startupRole: FOLLOWER
> > > }
> > > commitIndex: 8
> > > , server {
> > >   id: "n3"
> > >   address: "0.0.0.0:9002"
> > >   startupRole: FOLLOWER
> > > }
> > > commitIndex: 8
> > > ]
> > > applied {
> > >   term: 1
> > >   index: 8
> > > }
> > > committed {
> > >   term: 1
> > >   index: 8
> > > }
> > > lastEntry {
> > >   term: 1
> > >   index: 8
> > > }
> > > ```
> > >
> > > 2. After adding n4 as listener
> > >
> > > ```
> > > ./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001,0.0.0.0:9002,
> > > 0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2
> > > [main] WARN org.apache.ratis.metrics.MetricRegistriesLoader - Found
> > > multiple MetricRegistries: [class
> > > org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
> > > org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl]. Using the
> > > first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl
> > > group id: 02511d47-d67c-49a3-9011-abb3109a44c2
> > > leader info: n1(0.0.0.0:9000)
> > >
> > > [server {
> > >   id: "n1"
> > >   address: "0.0.0.0:9000"
> > >   startupRole: FOLLOWER
> > > }
> > > commitIndex: 12
> > > , server {
> > >   id: "n2"
> > >   address: "0.0.0.0:9001"
> > >   startupRole: FOLLOWER
> > > }
> > > commitIndex: 12
> > > , server {
> > >   id: "n3"
> > >   address: "0.0.0.0:9002"
> > >   startupRole: FOLLOWER
> > > }
> > > commitIndex: 12
> > > , server {
> > >   id: "n4"
> > >   address: "0.0.0.0:9003"
> > >   startupRole: LISTENER
> > > }
> > > commitIndex: 12
> > > ]
> > > applied {
> > >   term: 1
> > >   index: 12
> > > }
> > > committed {
> > >   term: 1
> > >   index: 12
> > > }
> > > lastEntry {
> > >   term: 1
> > >   index: 12
> > > }
> > > ```
> > >
> > > 3. After killing n3 and promoting n4 as follower
> > >
> > > ```
> > > ❯ ./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001,0.0.0.0:9002,
> > > 0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2
> > > [main] WARN org.apache.ratis.metrics.MetricRegistriesLoader - Found
> > > multiple MetricRegistries: [class
> > > org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
> > > org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl]. Using the
> > > first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl
> > > group id: 02511d47-d67c-49a3-9011-abb3109a44c2
> > > leader info: n1(0.0.0.0:9000)
> > >
> > > [server {
> > >   id: "n1"
> > >   address: "0.0.0.0:9000"
> > >   startupRole: FOLLOWER
> > > }
> > > commitIndex: 16
> > > , server {
> > >   id: "n2"
> > >   address: "0.0.0.0:9001"
> > >   startupRole: FOLLOWER
> > > }
> > > commitIndex: 16
> > > , server {
> > >   id: "n4"
> > >   address: "0.0.0.0:9003"
> > >   startupRole: FOLLOWER
> > > }
> > > commitIndex: 16
> > > ]
> > > applied {
> > >   term: 1
> > >   index: 16
> > > }
> > > committed {
> > >   term: 1
> > >   index: 16
> > > }
> > > lastEntry {
> > >   term: 1
> > >   index: 16
> > > }
> > > ```
> > >
> > > 4. After killing n1 (leader) instance
> > >
> > > ```
> > > ❯ ./ratis sh group info -peers 0.0.0.0:9000,0.0.0.0:9001,0.0.0.0:9002,
> > > 0.0.0.0:9003 -groupid 02511d47-d67c-49a3-9011-abb3109a44c2
> > > [main] WARN org.apache.ratis.metrics.MetricRegistriesLoader - Found
> > > multiple MetricRegistries: [class
> > > org.apache.ratis.metrics.impl.MetricRegistriesImpl, class
> > > org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl]. Using the
> > > first: class org.apache.ratis.metrics.impl.MetricRegistriesImpl
> > > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE:
> > io
> > > exception
> > > at
> > >
> > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:368)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:349)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:174)
> > > at
> > >
> > org.apache.ratis.proto.grpc.AdminProtocolServiceGrpc$AdminProtocolServiceBlockingStub.groupList(AdminProtocolServiceGrpc.java:573)
> > > at
> > >
> > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupList(GrpcClientProtocolClient.java:167)
> > > at
> > >
> > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:106)
> > > at
> > >
> > org.apache.ratis.client.impl.BlockingImpl.sendRequest(BlockingImpl.java:147)
> > > at
> > >
> > org.apache.ratis.client.impl.BlockingImpl.sendRequestWithRetry(BlockingImpl.java:109)
> > > at
> > >
> > org.apache.ratis.client.impl.GroupManagementImpl.list(GroupManagementImpl.java:69)
> > > at
> > >
> > org.apache.ratis.shell.cli.CliUtils.lambda$getGroupId$1(CliUtils.java:118)
> > > at
> > >
> > org.apache.ratis.shell.cli.CliUtils.applyFunctionReturnFirstNonNull(CliUtils.java:72)
> > > at org.apache.ratis.shell.cli.CliUtils.getGroupId(CliUtils.java:117)
> > > at
> > > org.apache.ratis.shell.cli.sh
> > .command.AbstractRatisCommand.run(AbstractRatisCommand.java:70)
> > > at
> > > org.apache.ratis.shell.cli.sh
> > .group.GroupInfoCommand.run(GroupInfoCommand.java:47)
> > > at org.apache.ratis.shell.cli.AbstractShell.run(AbstractShell.java:104)
> > > at org.apache.ratis.shell.cli.sh.RatisShell.main(RatisShell.java:62)
> > > Caused by:
> > >
> > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
> > > Connection refused: /0.0.0.0:9000
> > > Caused by: java.net.ConnectException: Connection refused
> > > at java.base/sun.nio.ch.Net.pollConnect(Native Method)
> > > at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
> > > at
> > > java.base/sun.nio.ch
> > .SocketChannelImpl.finishConnect(SocketChannelImpl.java:946)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:336)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:339)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:784)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:732)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:658)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> > > at java.base/java.lang.Thread.run(Thread.java:833)
> > > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE:
> > io
> > > exception
> > > at
> > >
> > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:368)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:349)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:174)
> > > at
> > >
> > org.apache.ratis.proto.grpc.AdminProtocolServiceGrpc$AdminProtocolServiceBlockingStub.groupInfo(AdminProtocolServiceGrpc.java:580)
> > > at
> > >
> > org.apache.ratis.grpc.client.GrpcClientProtocolClient.groupInfo(GrpcClientProtocolClient.java:173)
> > > at
> > >
> > org.apache.ratis.grpc.client.GrpcClientRpc.sendRequest(GrpcClientRpc.java:110)
> > > at
> > >
> > org.apache.ratis.client.impl.BlockingImpl.sendRequest(BlockingImpl.java:147)
> > > at
> > >
> > org.apache.ratis.client.impl.BlockingImpl.sendRequestWithRetry(BlockingImpl.java:109)
> > > at org.apache.ratis.client.impl.GroupManagementImpl.info
> > > (GroupManagementImpl.java:79)
> > > at
> > >
> > org.apache.ratis.shell.cli.CliUtils.lambda$getGroupInfo$2(CliUtils.java:146)
> > > at
> > >
> > org.apache.ratis.shell.cli.CliUtils.applyFunctionReturnFirstNonNull(CliUtils.java:72)
> > > at org.apache.ratis.shell.cli.CliUtils.getGroupInfo(CliUtils.java:145)
> > > at
> > > org.apache.ratis.shell.cli.sh
> > .command.AbstractRatisCommand.run(AbstractRatisCommand.java:71)
> > > at
> > > org.apache.ratis.shell.cli.sh
> > .group.GroupInfoCommand.run(GroupInfoCommand.java:47)
> > > at org.apache.ratis.shell.cli.AbstractShell.run(AbstractShell.java:104)
> > > at org.apache.ratis.shell.cli.sh.RatisShell.main(RatisShell.java:62)
> > > Caused by:
> > >
> > org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
> > > Connection refused: /0.0.0.0:9000
> > > Caused by: java.net.ConnectException: Connection refused
> > > at java.base/sun.nio.ch.Net.pollConnect(Native Method)
> > > at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
> > > at
> > > java.base/sun.nio.ch
> > .SocketChannelImpl.finishConnect(SocketChannelImpl.java:946)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:336)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:339)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:784)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:732)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:658)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> > > at
> > >
> > org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> > > at java.base/java.lang.Thread.run(Thread.java:833)
> > > group id: 02511d47-d67c-49a3-9011-abb3109a44c2
> > > leader info: ()
> > >
> > > [server {
> > >   id: "n2"
> > >   address: "0.0.0.0:9001"
> > >   startupRole: FOLLOWER
> > > }
> > > commitIndex: 16
> > > , server {
> > >   id: "n1"
> > >   address: "0.0.0.0:9000"
> > >   startupRole: FOLLOWER
> > > }
> > > commitIndex: 16
> > > , server {
> > >   id: "n4"
> > >   address: "0.0.0.0:9003"
> > >   startupRole: FOLLOWER
> > > }
> > > commitIndex: 16
> > > ]
> > > applied {
> > >   term: 1
> > >   index: 16
> > > }
> > > committed {
> > >   term: 1
> > >   index: 16
> > > }
> > > lastEntry {
> > >   term: 1
> > >   index: 16
> > > }
> > > ```
> > >
> > > Logs from n4
> > > ```
> > > INFO  [2026-01-15 17:48:06,696] [grpc-default-executor-2]
> > > [RaftServer$Division]: n4@group-ABB3109A44C2 replies to PRE_VOTE vote
> > > request: n2<-n4#0:FAIL-t1-last:(t:1, i:16). Peer's state:
> > > n4@group-ABB3109A44C2:t1, leader=n1, voted=null,
> > > raftlog=Memoized:n4@group-ABB3109A44C2-SegmentedRaftLog
> > :OPENED:c16:last(t:1,
> > > i:16), conf=conf: {index: 15, cur=peers:[n1|0.0.0.0:9000, n2|
> > 0.0.0.0:9001,
> > > n4|0.0.0.0:9003]|listeners:[], old=null}
> > > INFO  [2026-01-15 17:48:06,897] [grpc-default-executor-2]
> > > [RaftServer$Division]: n4@group-ABB3109A44C2: receive
> > requestVote(PRE_VOTE,
> > > n2, group-ABB3109A44C2, 1, (t:1, i:16))
> > > INFO  [2026-01-15 17:48:06,897] [grpc-default-executor-2] [VoteContext]:
> > > n4@group-ABB3109A44C2-LISTENER: reject PRE_VOTE from n2: this server is
> > a
> > > listener, who is a non-voting member
> > > ```
> > >
> > >
> > > Logs from n2
> > >
> > > ```
> > > INFO  [2026-01-15 17:48:03,347] [n2@group-ABB3109A44C2-LeaderElection176
> > ]
> > > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection176 PRE_VOTE
> > round 0:
> > > submit vote requests at term 1 for conf: {index: 15, cur=peers:[n1|
> > > 0.0.0.0:9000, n2|0.0.0.0:9001, n4|0.0.0.0:9003]|listeners:[], old=null}
> > > INFO  [2026-01-15 17:48:03,348] [n2@group-ABB3109A44C2-LeaderElection176
> > ]
> > > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection176 got exception
> > > when requesting votes: java.util.concurrent.ExecutionException:
> > > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE:
> > io
> > > exception
> > > INFO  [2026-01-15 17:48:03,352] [n2@group-ABB3109A44C2-LeaderElection176
> > ]
> > > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection176: PRE_VOTE
> > > REJECTED received 1 response(s) and 1 exception(s):
> > > INFO  [2026-01-15 17:48:03,352] [n2@group-ABB3109A44C2-LeaderElection176
> > ]
> > > [LeaderElection]:   Response 0: n2<-n4#0:FAIL-t1-last:(t:1, i:16)
> > > INFO  [2026-01-15 17:48:03,352] [n2@group-ABB3109A44C2-LeaderElection176
> > ]
> > > [LeaderElection]:   Exception 1: java.util.concurrent.ExecutionException:
> > > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE:
> > io
> > > exception
> > > ```
> > >
> > > This indicates that the cluster is in an unstable state.
> > > I am willing to contribute, could you guide me a bit on this?
> > >
> > >
> > > Regards,
> > > Snehasish
> > >
> > > On Wed, 14 Jan 2026 at 08:44, Snehasish Roy <[email protected]>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > >
> > > > Thank you for your inputs. I will check and update this thread.
> > > >
> > > >
> > > > Regards,
> > > > Snehasish
> > > >
> > > > On Wed, 7 Jan, 2026, 8:52 am Xinyu Tan, <[email protected]> wrote:
> > > >
> > > >> Hi,Snehasish
> > > >>
> > > >> In your scenario, if you kill n3, which is acting as a follower, the
> > > >> cluster will have 3 non-listener and 1 listener, with one follower
> > already
> > > >> offline. At this point, the majority situation becomes quite risky
> > because
> > > >> if any non-listener goes down from here, the Raft group will not be
> > able to
> > > >> form a quorum and elect a new leader.
> > > >>
> > > >> Although you have promoted n4 to a listener and removed n3, before
> > this
> > > >> request completes, the majority of the Raft group is still 2.
> > Therefore,
> > > >> after you kill n1, a new leader cannot be elected. In my
> > understanding,
> > > >> this phenomenon is not a bug and aligns with the expected behavior of
> > the
> > > >> algorithm.
> > > >>
> > > >> If you want to test how to safely promote a listener to a follower,
> > make
> > > >> sure that before the promotion request completes (you can confirm
> > this with
> > > >> shell commands as suggested by sze), the current leader and follower
> > > >> members maintain the majority online. Otherwise, the promotion action
> > will
> > > >> not be successful, and this is not a problem with the implementation
> > but a
> > > >> boundary of the Raft algorithm.
> > > >>
> > > >> Feel free to do more testing on this feature of Ratis. If you
> > encounter
> > > >> the following issues, it would indicate that there is indeed a
> > problem with
> > > >> the implementation, and we welcome discussions and contributions:
> > > >> * You find that even with the majority of leader and follower members
> > > >> online, you still cannot successfully promote a listener to a
> > follower.
> > > >> * In your case, because the majority was not maintained, the member
> > > >> change failed. But after you restart n1 or n3 and re-establish the
> > > >> majority, the Raft group still cannot elect a leader or elects a
> > leader but
> > > >> fails to perform member changes.
> > > >>
> > > >> We look forward to your testing.
> > > >>
> > > >> Best
> > > >> --------------
> > > >> Xinyu Tan
> > > >>
> > > >>
> > > >> On 2025/12/29 10:53:40 Snehasish Roy wrote:
> > > >> > Hello everyone,
> > > >> >
> > > >> > Happy Holidays. This is my first email to this community so kindly
> > > >> excuse
> > > >> > me for any mistakes.
> > > >> >
> > > >> > I initially started a 3 node Ratis Cluster and then added a
> > listener in
> > > >> the
> > > >> > Cluster using the setConfiguration(List.of(n1,n2,n3), List.of(n4))
> > > >> based on
> > > >> > the following documentation
> > > >> >
> > https://jojochuang.github.io/ratis-site/docs/developer-guide/listeners
> > > >> >
> > > >> > ```
> > > >> > INFO  [2025-12-29 15:57:01,887] [n1-server-thread1]
> > > >> [RaftServer$Division]:
> > > >> > n1@group-ABB3109A44C2-LeaderStateImpl: startSetConfiguration
> > > >> > SetConfigurationRequest:client-044D31187FB4->n1@group-ABB3109A44C2,
> > > >> cid=3,
> > > >> > seq=null, RW, null, SET_UNCONDITIONALLY, servers:[n1|0.0.0.0:9000,
> > n2|
> > > >> > 0.0.0.0:9001, n3|0.0.0.0:9002], listeners:[n4|0.0.0.0:9003]
> > > >> > ```
> > > >> >
> > > >> > Then I killed one of the Ratis follower node (n3) followed by
> > promoting
> > > >> the
> > > >> > listener to the follower using setConfiguration(List.of(n1,n2,n4))
> > > >> command
> > > >> > to maintain the cluster size of 3.
> > > >> > Please note that n3 has been removed from the list of followers and
> > > >> there
> > > >> > are no more listeners in the cluster and there were no failures
> > observed
> > > >> > while issuing the command.
> > > >> >
> > > >> > ```
> > > >> > INFO  [2025-12-29 16:02:54,227] [n1-server-thread2]
> > > >> [RaftServer$Division]:
> > > >> > n1@group-ABB3109A44C2-LeaderStateImpl: startSetConfiguration
> > > >> > SetConfigurationRequest:client-2438CA24E2F3->n1@group-ABB3109A44C2,
> > > >> cid=4,
> > > >> > seq=null, RW, null, SET_UNCONDITIONALLY, servers:[n1|0.0.0.0:9000,
> > n2|
> > > >> > 0.0.0.0:9001, n4|0.0.0.0:9003], listeners:[]
> > > >> > ```
> > > >> >
> > > >> > Then I killed the leader instance n1. Post which n2 attempted to
> > become
> > > >> a
> > > >> > leader and starts asking for votes from n1 and n4. There is no
> > response
> > > >> > from n1 as it's not alive and n4 is rejecting the pre_vote request
> > from
> > > >> n2
> > > >> > because it still thinks it's a listener.
> > > >> >
> > > >> > Logs from n2
> > > >> > ```
> > > >> > INFO  [2025-12-29 16:04:10,051]
> > [n2@group-ABB3109A44C2-LeaderElection30
> > > >> ]
> > > >> > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection30 PRE_VOTE
> > > >> round 0:
> > > >> > submit vote requests at term 1 for conf: {index: 15, cur=peers:[n1|
> > > >> > 0.0.0.0:9000, n2|0.0.0.0:9001, n4|0.0.0.0:9003]|listeners:[],
> > old=null}
> > > >> > INFO  [2025-12-29 16:04:10,052]
> > [n2@group-ABB3109A44C2-LeaderElection30
> > > >> ]
> > > >> > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection30 got
> > exception
> > > >> when
> > > >> > requesting votes: java.util.concurrent.ExecutionException:
> > > >> > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> > > >> UNAVAILABLE: io
> > > >> > exception
> > > >> > INFO  [2025-12-29 16:04:10,054]
> > [n2@group-ABB3109A44C2-LeaderElection30
> > > >> ]
> > > >> > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection30: PRE_VOTE
> > > >> REJECTED
> > > >> > received 1 response(s) and 1 exception(s):
> > > >> > INFO  [2025-12-29 16:04:10,054]
> > [n2@group-ABB3109A44C2-LeaderElection30
> > > >> ]
> > > >> > [LeaderElection]:   Response 0: n2<-n4#0:FAIL-t1-last:(t:1, i:16)
> > > >> > INFO  [2025-12-29 16:04:10,054]
> > [n2@group-ABB3109A44C2-LeaderElection30
> > > >> ]
> > > >> > [LeaderElection]:   Exception 1:
> > > >> java.util.concurrent.ExecutionException:
> > > >> > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException:
> > > >> UNAVAILABLE: io
> > > >> > exception
> > > >> > ```
> > > >> >
> > > >> >
> > > >> > Due to lack of leader, the cluster is no more stable.
> > > >> >
> > > >> > Logs from n4
> > > >> > ```
> > > >> > INFO  [2025-12-29 16:05:03,405] [grpc-default-executor-2]
> > > >> > [RaftServer$Division]: n4@group-ABB3109A44C2: receive
> > > >> requestVote(PRE_VOTE,
> > > >> > n2, group-ABB3109A44C2, 1, (t:1, i:16))
> > > >> > INFO  [2025-12-29 16:05:03,405] [grpc-default-executor-2]
> > [VoteContext]:
> > > >> > n4@group-ABB3109A44C2-LISTENER: reject PRE_VOTE from n2: this
> > server
> > > >> is a
> > > >> > listener, who is a non-voting member
> > > >> > INFO  [2025-12-29 16:05:03,405] [grpc-default-executor-2]
> > > >> > [RaftServer$Division]: n4@group-ABB3109A44C2 replies to PRE_VOTE
> > vote
> > > >> > request: n2<-n4#0:FAIL-t1-last:(t:1, i:16). Peer's state:
> > > >> > n4@group-ABB3109A44C2:t1, leader=n1, voted=null,
> > > >> > raftlog=Memoized:n4@group-ABB3109A44C2-SegmentedRaftLog
> > > >> :OPENED:c16:last(t:1,
> > > >> > i:16), conf=conf: {index: 15, cur=peers:[n1|0.0.0.0:9000, n2|
> > > >> 0.0.0.0:9001,
> > > >> > n4|0.0.0.0:9003]|listeners:[], old=null}
> > > >> > ```
> > > >> >
> > > >> > So my question is how to correctly promote a listener to a follower?
> > > >> Did I
> > > >> > miss some step? Or is there a bug in the code? If it's the latter, I
> > > >> would
> > > >> > be happy to contribute. Please let me know if you need any more
> > > >> debugging
> > > >> > information.
> > > >> >
> > > >> > Thank you again for looking into this issue.
> > > >> >
> > > >> >
> > > >> > Regards,
> > > >> > Snehasish
> > > >> >
> > > >>
> > > >
> > >
> >
> 

Reply via email to