Hello,

Thank you for your inputs. I will check and update this thread.


Regards,
Snehasish

On Wed, 7 Jan, 2026, 8:52 am Xinyu Tan, <[email protected]> wrote:

> Hi,Snehasish
>
> In your scenario, if you kill n3, which is acting as a follower, the
> cluster will have 3 non-listener and 1 listener, with one follower already
> offline. At this point, the majority situation becomes quite risky because
> if any non-listener goes down from here, the Raft group will not be able to
> form a quorum and elect a new leader.
>
> Although you have promoted n4 to a listener and removed n3, before this
> request completes, the majority of the Raft group is still 2. Therefore,
> after you kill n1, a new leader cannot be elected. In my understanding,
> this phenomenon is not a bug and aligns with the expected behavior of the
> algorithm.
>
> If you want to test how to safely promote a listener to a follower, make
> sure that before the promotion request completes (you can confirm this with
> shell commands as suggested by sze), the current leader and follower
> members maintain the majority online. Otherwise, the promotion action will
> not be successful, and this is not a problem with the implementation but a
> boundary of the Raft algorithm.
>
> Feel free to do more testing on this feature of Ratis. If you encounter
> the following issues, it would indicate that there is indeed a problem with
> the implementation, and we welcome discussions and contributions:
> * You find that even with the majority of leader and follower members
> online, you still cannot successfully promote a listener to a follower.
> * In your case, because the majority was not maintained, the member change
> failed. But after you restart n1 or n3 and re-establish the majority, the
> Raft group still cannot elect a leader or elects a leader but fails to
> perform member changes.
>
> We look forward to your testing.
>
> Best
> --------------
> Xinyu Tan
>
>
> On 2025/12/29 10:53:40 Snehasish Roy wrote:
> > Hello everyone,
> >
> > Happy Holidays. This is my first email to this community so kindly excuse
> > me for any mistakes.
> >
> > I initially started a 3 node Ratis Cluster and then added a listener in
> the
> > Cluster using the setConfiguration(List.of(n1,n2,n3), List.of(n4)) based
> on
> > the following documentation
> > https://jojochuang.github.io/ratis-site/docs/developer-guide/listeners
> >
> > ```
> > INFO  [2025-12-29 15:57:01,887] [n1-server-thread1]
> [RaftServer$Division]:
> > n1@group-ABB3109A44C2-LeaderStateImpl: startSetConfiguration
> > SetConfigurationRequest:client-044D31187FB4->n1@group-ABB3109A44C2,
> cid=3,
> > seq=null, RW, null, SET_UNCONDITIONALLY, servers:[n1|0.0.0.0:9000, n2|
> > 0.0.0.0:9001, n3|0.0.0.0:9002], listeners:[n4|0.0.0.0:9003]
> > ```
> >
> > Then I killed one of the Ratis follower node (n3) followed by promoting
> the
> > listener to the follower using setConfiguration(List.of(n1,n2,n4))
> command
> > to maintain the cluster size of 3.
> > Please note that n3 has been removed from the list of followers and there
> > are no more listeners in the cluster and there were no failures observed
> > while issuing the command.
> >
> > ```
> > INFO  [2025-12-29 16:02:54,227] [n1-server-thread2]
> [RaftServer$Division]:
> > n1@group-ABB3109A44C2-LeaderStateImpl: startSetConfiguration
> > SetConfigurationRequest:client-2438CA24E2F3->n1@group-ABB3109A44C2,
> cid=4,
> > seq=null, RW, null, SET_UNCONDITIONALLY, servers:[n1|0.0.0.0:9000, n2|
> > 0.0.0.0:9001, n4|0.0.0.0:9003], listeners:[]
> > ```
> >
> > Then I killed the leader instance n1. Post which n2 attempted to become a
> > leader and starts asking for votes from n1 and n4. There is no response
> > from n1 as it's not alive and n4 is rejecting the pre_vote request from
> n2
> > because it still thinks it's a listener.
> >
> > Logs from n2
> > ```
> > INFO  [2025-12-29 16:04:10,051] [n2@group-ABB3109A44C2-LeaderElection30]
> > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection30 PRE_VOTE round
> 0:
> > submit vote requests at term 1 for conf: {index: 15, cur=peers:[n1|
> > 0.0.0.0:9000, n2|0.0.0.0:9001, n4|0.0.0.0:9003]|listeners:[], old=null}
> > INFO  [2025-12-29 16:04:10,052] [n2@group-ABB3109A44C2-LeaderElection30]
> > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection30 got exception
> when
> > requesting votes: java.util.concurrent.ExecutionException:
> > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE:
> io
> > exception
> > INFO  [2025-12-29 16:04:10,054] [n2@group-ABB3109A44C2-LeaderElection30]
> > [LeaderElection]: n2@group-ABB3109A44C2-LeaderElection30: PRE_VOTE
> REJECTED
> > received 1 response(s) and 1 exception(s):
> > INFO  [2025-12-29 16:04:10,054] [n2@group-ABB3109A44C2-LeaderElection30]
> > [LeaderElection]:   Response 0: n2<-n4#0:FAIL-t1-last:(t:1, i:16)
> > INFO  [2025-12-29 16:04:10,054] [n2@group-ABB3109A44C2-LeaderElection30]
> > [LeaderElection]:   Exception 1: java.util.concurrent.ExecutionException:
> > org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE:
> io
> > exception
> > ```
> >
> >
> > Due to lack of leader, the cluster is no more stable.
> >
> > Logs from n4
> > ```
> > INFO  [2025-12-29 16:05:03,405] [grpc-default-executor-2]
> > [RaftServer$Division]: n4@group-ABB3109A44C2: receive
> requestVote(PRE_VOTE,
> > n2, group-ABB3109A44C2, 1, (t:1, i:16))
> > INFO  [2025-12-29 16:05:03,405] [grpc-default-executor-2] [VoteContext]:
> > n4@group-ABB3109A44C2-LISTENER: reject PRE_VOTE from n2: this server is
> a
> > listener, who is a non-voting member
> > INFO  [2025-12-29 16:05:03,405] [grpc-default-executor-2]
> > [RaftServer$Division]: n4@group-ABB3109A44C2 replies to PRE_VOTE vote
> > request: n2<-n4#0:FAIL-t1-last:(t:1, i:16). Peer's state:
> > n4@group-ABB3109A44C2:t1, leader=n1, voted=null,
> > raftlog=Memoized:n4@group-ABB3109A44C2-SegmentedRaftLog
> :OPENED:c16:last(t:1,
> > i:16), conf=conf: {index: 15, cur=peers:[n1|0.0.0.0:9000, n2|
> 0.0.0.0:9001,
> > n4|0.0.0.0:9003]|listeners:[], old=null}
> > ```
> >
> > So my question is how to correctly promote a listener to a follower? Did
> I
> > miss some step? Or is there a bug in the code? If it's the latter, I
> would
> > be happy to contribute. Please let me know if you need any more debugging
> > information.
> >
> > Thank you again for looking into this issue.
> >
> >
> > Regards,
> > Snehasish
> >
>

Reply via email to