Hi all, Currently, we are working on integrating the dynamic voter change feature in our downstream project and found an issue about the auto join(KAFKA-19850 <https://issues.apache.org/jira/browse/KAFKA-19850>).
The main problem is that when auto.join is enabled, once a voter is removed, it'll get auto-joined immediately. We know this limitation, so we ask users to "shutdown the node" before doing the voter removal. However, this will cause some problems: 1. Broken quorum: This "shutdown the to-be-removed controller first" operation might break the quorum in the worst case. For example, 3 controller nodes quorum (C1, C2, C3), C1 is the leader, C3 is already caught up with C1, C2 is still catching up with the leader. When users want to remove C3, following the guide, users shutdown the C3 first. But at this point of time, the quorum is broken and the kafka cluster is basically unavailable. 2. Not cloud-native operation: In the cloud environment(k8s), it's not possible to shutdown a node and wait for something to be completed and then start it up again. 3. User confusion: If users don't check the doc first and directly do the voter removal with auto.join enabled, the removed node will join immediately, which confuse users. Currently, we are working on a fix for v4.2.0, here are the thoughts: 1. Avoiding to auto-join a removed node into the voters until this node is restarted. During this period of time, the node can be added manually. 2. Adding a timer (ex: 5 mins) after a node is removed. It will be auto-joined after the timeout or node restart. The timeout can be configurable in the future release. Personally, the solution (1) makes more sense in my opinion. The solution (2) might also cause unexpected auto-join if the timer is too short. I also think we should modify the semantics of auto join as "a node will be auto-joined only when node startup". This way, we don't have to ask users to shutdown the node before doing voter removal. And I also think this change can be included in v4.2.0 because we haven't released the auto join feature yet. Do you have any thoughts? Thank you, Luke On Fri, Mar 29, 2024 at 8:58 AM José Armando García Sancio <[email protected]> wrote: > Jun, thanks a lot for your help. I feel that the KIP is much better > after your detailed input. > > If there is no more feedback, I'll start a voting thread tomorrow > morning. I'll monitor KIP-1022's discussion thread and update this KIP > with anything that affects the KIP's specification. > > Thanks, > -- > -José >
