Hi Luke, Thanks for the reply and the questions.
RE LC1: What is the state that a voter is not registered in the cluster? The state when a running voter X is not registered in the cluster is one where a feature upgrade that would otherwise not be allowed can occur before X completes its new registration. See JR2 for more discussion around this race. In general, unregistering a node assumes the node has already been taken down. However, I think Kafka should be able to "recover" in the event of an accidental unregistration. That is the motivation for "re-registering" in the same process lifetime. Otherwise, we would need to restart the controller node to register again, which could impact quorum availability. I think it would be nice to have the controller reject unregistering voters, but that requires leaking the voter set, which is internal to KRaft, to the metadata layer. See JR3 for more around that point. What will happen if the unregistered voter is the leader controller? I think this is something we can disallow without leaking the voter set to the metadata layer. I think the reason why this KRaft state leak is needed but not the voters (at least for now) is the active controller is the KRaft leader and therefore can write new offsets to the log. Ideally, the controllers should not be aware of the KRaft voters, because the inactive controllers set is not necessarily the voter set besides the leader. If the active controller gets a request to unregister itself, it can return an `INVALID_REQUEST` error. What do you think? Best, Kevin Wu On Thu, May 14, 2026 at 1:25 AM Luke Chen <[email protected]> wrote: > Hi Kevin, > > Sorry for the late review. > I have a question: > > LC1. We said we can allow users to "unregister an active voter", but will > re-register it later. > What is the state that a voter is not registered in the cluster? > What will happen if the unregistered voter is the leader controller? > We did many protections when adding/removing voters to avoid the broken > quorum, so I think we should also have some validation to check if this is > a voter before unregistering it. > > > Thanks, > Luke > > On Wed, May 13, 2026 at 8:20 PM José Armando García Sancio via dev < > [email protected]> wrote: > > > Hi Kevin, > > > > On Tue, May 12, 2026 at 8:42 PM Kevin Wu <[email protected]> wrote: > > > RE JS1: I like the idea of a separate `CONTROLER_ID_NOT_REGISTERED` > error > > > code for unregistering a controller which is not registered. I have > > updated > > > the KIP with this. > > > > > > RE JS2: Another case where reusing the ApiKey 1 metadata record may not > > be > > > a good idea is for a combined node, where the broker and controller > share > > > the same node id. When the controller replays this record, should it > > > unregister the broker or the controller? The answer is not super > > obvious. I > > > think the only way to distinguish between the broker and controller > > > un-registrations is by looking if the `brokerEpoch` is set in the > record, > > > but that seems less intuitive than introducing a separate record. I > have > > > updated the KIP with this case too. > > > > Yes, we discussed both issues offline and decided to introduce a new > > error for CONTROLER_ID_NOT_REGISTERED and a new unregistration > > controller record. Because of the combined mode, a controller > > registration and a broker registration can share the same node id. > > Without a separate unregistration record, it is difficult to determine > > during replay which entity is being unregistered. > > > > Thanks, > > -- > > -José > > >
