Hi Luke, Thanks for the discussion. I have updated the KIP to mention the INVALID_REQUEST error.
Please let me know if you have any other questions or comments. Best, Kevin Wu On Thu, May 14, 2026 at 9:36 PM Luke Chen <[email protected]> wrote: > Hi Kevin, > > > What is the state that a voter is not registered in the cluster? > > > The state when a running voter X is not registered in the cluster is one > where a feature upgrade that would otherwise not be allowed can occur > before X completes its new registration. See JR2 for more discussion around > this race. > > In general, unregistering a node assumes the node has already been taken > down. However, I think Kafka should be able to "recover" in the event of an > accidental unregistration. That is the motivation for "re-registering" in > the same process lifetime. Otherwise, we would need to restart the > controller node to register again, which could impact quorum availability. > I think it would be nice to have the controller reject unregistering > voters, but that requires leaking the voter set, which is internal to > KRaft, to the metadata layer. See JR3 for more around that point. > > Fair enough. > > > > What will happen if the unregistered voter is the leader controller? > > > I think this is something we can disallow without leaking the voter set > to > the metadata layer. I think the reason why this KRaft state leak is needed > but not the voters (at least for now) is the active controller is the KRaft > leader and therefore can write new offsets to the log. Ideally, the > controllers should not be aware of the KRaft voters, because the inactive > controllers set is not necessarily the voter set besides the leader. > If the active controller gets a request to unregister itself, it can return > an `INVALID_REQUEST` error. What do you think? > > Yes, at least we should reject the registration for the leader controller. > Returning the `INVALID_REQUEST` error makes sense to me. > > > Thanks, > Luke > > On Fri, May 15, 2026 at 1:17 AM Kevin Wu <[email protected]> wrote: > > > Hi Luke, > > > > Thanks for the reply and the questions. > > > > RE LC1: > > > > What is the state that a voter is not registered in the cluster? > > > > The state when a running voter X is not registered in the cluster is one > > where a feature upgrade that would otherwise not be allowed can occur > > before X completes its new registration. See JR2 for more discussion > around > > this race. > > In general, unregistering a node assumes the node has already been taken > > down. However, I think Kafka should be able to "recover" in the event of > an > > accidental unregistration. That is the motivation for "re-registering" > in > > the same process lifetime. Otherwise, we would need to restart the > > controller node to register again, which could impact quorum > availability. > > I think it would be nice to have the controller reject unregistering > > voters, but that requires leaking the voter set, which is internal to > > KRaft, to the metadata layer. See JR3 for more around that point. > > > > What will happen if the unregistered voter is the leader controller? > > > > I think this is something we can disallow without leaking the voter set > to > > the metadata layer. I think the reason why this KRaft state leak is > needed > > but not the voters (at least for now) is the active controller is the > KRaft > > leader and therefore can write new offsets to the log. Ideally, the > > controllers should not be aware of the KRaft voters, because the inactive > > controllers set is not necessarily the voter set besides the leader. > > If the active controller gets a request to unregister itself, it can > return > > an `INVALID_REQUEST` error. What do you think? > > > > Best, > > Kevin Wu > > > > > > On Thu, May 14, 2026 at 1:25 AM Luke Chen <[email protected]> wrote: > > > > > Hi Kevin, > > > > > > Sorry for the late review. > > > I have a question: > > > > > > LC1. We said we can allow users to "unregister an active voter", but > will > > > re-register it later. > > > What is the state that a voter is not registered in the cluster? > > > What will happen if the unregistered voter is the leader controller? > > > We did many protections when adding/removing voters to avoid the broken > > > quorum, so I think we should also have some validation to check if this > > is > > > a voter before unregistering it. > > > > > > > > > Thanks, > > > Luke > > > > > > On Wed, May 13, 2026 at 8:20 PM José Armando García Sancio via dev < > > > [email protected]> wrote: > > > > > > > Hi Kevin, > > > > > > > > On Tue, May 12, 2026 at 8:42 PM Kevin Wu <[email protected]> > > wrote: > > > > > RE JS1: I like the idea of a separate `CONTROLER_ID_NOT_REGISTERED` > > > error > > > > > code for unregistering a controller which is not registered. I have > > > > updated > > > > > the KIP with this. > > > > > > > > > > RE JS2: Another case where reusing the ApiKey 1 metadata record may > > not > > > > be > > > > > a good idea is for a combined node, where the broker and controller > > > share > > > > > the same node id. When the controller replays this record, should > it > > > > > unregister the broker or the controller? The answer is not super > > > > obvious. I > > > > > think the only way to distinguish between the broker and controller > > > > > un-registrations is by looking if the `brokerEpoch` is set in the > > > record, > > > > > but that seems less intuitive than introducing a separate record. I > > > have > > > > > updated the KIP with this case too. > > > > > > > > Yes, we discussed both issues offline and decided to introduce a new > > > > error for CONTROLER_ID_NOT_REGISTERED and a new unregistration > > > > controller record. Because of the combined mode, a controller > > > > registration and a broker registration can share the same node id. > > > > Without a separate unregistration record, it is difficult to > determine > > > > during replay which entity is being unregistered. > > > > > > > > Thanks, > > > > -- > > > > -José > > > > > > > > > >
