[
https://issues.apache.org/jira/browse/KAFKA-19850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18034381#comment-18034381
]
Paolo Patierno commented on KAFKA-19850:
----------------------------------------
We are seeing the same UX issue while implementing the dynamic quorum support
with controllers scaling within Strimzi (running Apache Kafka on Kubernetes)
and specifically when a user starts with some mixed nodes (broker + controller)
then scale up by adding dedicated controller nodes but then they want to remove
the controller role from the mixed nodes. In such case, the mixed nodes are not
shutdown and removed (forever) but they are rolled (without controller role)
and, as Luke pointed out, in a cloud native environment there is no such a
waiting time to have an operator taking actions (like remove Raft voter call)
between the shutdown and the restarting of a pod (which is out of control and
done by Kubernetes itself).
For this reason, it would be great having the remove Raft voter being called
before shutting down the nodes (as opposite what the documentation says when
using auto-join) which works fine when auto-join is not enabled.
But at the same time the auto-join is a very useful feature that, even more in
an automated scenario, helps with controller registration on scaling up.
So I would be for allowing the removal of Raft voters before their scale down
but avoiding the immediate re-join.
The FollowerState class could track the node being removed and this could help
skipping the immediate re-join within the shouldSendAddOrRemoveVoterRequest (in
the KafkaRaftClient).
Of course, the node is able to join again on restart or because the user runs
the corresponding command manually.
> KRaft voter auto join will add a removed voter immediately
> ----------------------------------------------------------
>
> Key: KAFKA-19850
> URL: https://issues.apache.org/jira/browse/KAFKA-19850
> Project: Kafka
> Issue Type: Improvement
> Affects Versions: 4.2.0
> Reporter: Luke Chen
> Priority: Major
>
> In v4.2.0, we are able to auto join a controller with the configuration
> `controller.quorum.auto.join.enable=true` set
> ([KIP-853|https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Controller+Membership+Changes#KIP853:KRaftControllerMembershipChanges-Controllerautojoining](KAFKA-19078)).
> This is a good improvement for controller addition, but it has a UX issue,
> which is that when a controller is removed via removeVoterRequest, it will be
> added immediately due to `controller.quorum.auto.join.enable=true`. In the
> KIP, we also mention you have to stop the controller before removing the
> controller:
>
> {noformat}
> controller.quorum.auto.join.enable:
> Controls whether a KRaft controller should automatically join the cluster
> metadata partition for its cluster id. If the configuration is set to
> true the controller must be stopped before removing the controller with
> kafka-metadata-quorum remove-controller.{noformat}
>
> This is not a user friendly behavior in my opinion. And it will cause many
> confusion to users and thought there is something wrong in the controller
> removal. Furthermore, in the kubernetes environment which is controlled by
> the operator, it is not the cloud native way to shutdown a node, do some
> operation, then start it up.
>
> So, I propose we can improve it by "the removed controller will not be auto
> joined before this controller restarted". That is:
> 1. Once the controller is removed from voters set, it won't be auto joined
> even if `controller.quorum.auto.join.enable=true`
> 2. The controller can be manually join the voters in this state
> 3. The controller node will be auto join the voters set after node restarted.
>
> So basically, the semantics is not changed, it just add some unexpected
> remove/add loop. Thoughts?
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)