Hi Luke,

Thanks for the discussion. I have updated the KIP to mention the
INVALID_REQUEST error.

Please let me know if you have any other questions or comments.

Best,
Kevin Wu

On Thu, May 14, 2026 at 9:36 PM Luke Chen <[email protected]> wrote:

> Hi Kevin,
>
> > What is the state that a voter is not registered in the cluster?
>
> > The state when a running voter X is not registered in the cluster is one
> where a feature upgrade that would otherwise not be allowed can occur
> before X completes its new registration. See JR2 for more discussion around
> this race.
> > In general, unregistering a node assumes the node has already been taken
> down. However, I think Kafka should be able to "recover" in the event of an
> accidental unregistration. That is the  motivation for "re-registering" in
> the same process lifetime. Otherwise, we would need to restart the
> controller node to register again, which could impact quorum availability.
> I think it would be nice to have the controller reject unregistering
> voters, but that requires leaking the voter set, which is internal to
> KRaft, to the metadata layer. See JR3 for more around that point.
>
> Fair enough.
>
>
> > What will happen if the unregistered voter is the leader controller?
>
> > I think this is something we can disallow without leaking the voter set
> to
> the metadata layer. I think the reason why this KRaft state leak is needed
> but not the voters (at least for now) is the active controller is the KRaft
> leader and therefore can write new offsets to the log. Ideally, the
> controllers should not be aware of the KRaft voters, because the inactive
> controllers set is not necessarily the voter set besides the leader.
> If the active controller gets a request to unregister itself, it can return
> an `INVALID_REQUEST` error. What do you think?
>
> Yes, at least we should reject the registration for the leader controller.
> Returning the `INVALID_REQUEST` error makes sense to me.
>
>
> Thanks,
> Luke
>
> On Fri, May 15, 2026 at 1:17 AM Kevin Wu <[email protected]> wrote:
>
> > Hi Luke,
> >
> > Thanks for the reply and the questions.
> >
> > RE LC1:
> >
> > What is the state that a voter is not registered in the cluster?
> >
> > The state when a running voter X is not registered in the cluster is one
> > where a feature upgrade that would otherwise not be allowed can occur
> > before X completes its new registration. See JR2 for more discussion
> around
> > this race.
> > In general, unregistering a node assumes the node has already been taken
> > down. However, I think Kafka should be able to "recover" in the event of
> an
> > accidental unregistration. That is the  motivation for "re-registering"
> in
> > the same process lifetime. Otherwise, we would need to restart the
> > controller node to register again, which could impact quorum
> availability.
> > I think it would be nice to have the controller reject unregistering
> > voters, but that requires leaking the voter set, which is internal to
> > KRaft, to the metadata layer. See JR3 for more around that point.
> >
> > What will happen if the unregistered voter is the leader controller?
> >
> > I think this is something we can disallow without leaking the voter set
> to
> > the metadata layer. I think the reason why this KRaft state leak is
> needed
> > but not the voters (at least for now) is the active controller is the
> KRaft
> > leader and therefore can write new offsets to the log. Ideally, the
> > controllers should not be aware of the KRaft voters, because the inactive
> > controllers set is not necessarily the voter set besides the leader.
> > If the active controller gets a request to unregister itself, it can
> return
> > an `INVALID_REQUEST` error. What do you think?
> >
> > Best,
> > Kevin Wu
> >
> >
> > On Thu, May 14, 2026 at 1:25 AM Luke Chen <[email protected]> wrote:
> >
> > > Hi Kevin,
> > >
> > > Sorry for the late review.
> > > I have a question:
> > >
> > > LC1. We said we can allow users to "unregister an active voter", but
> will
> > > re-register it later.
> > > What is the state that a voter is not registered in the cluster?
> > > What will happen if the unregistered voter is the leader controller?
> > > We did many protections when adding/removing voters to avoid the broken
> > > quorum, so I think we should also have some validation to check if this
> > is
> > > a voter before unregistering it.
> > >
> > >
> > > Thanks,
> > > Luke
> > >
> > > On Wed, May 13, 2026 at 8:20 PM José Armando García Sancio via dev <
> > > [email protected]> wrote:
> > >
> > > > Hi Kevin,
> > > >
> > > > On Tue, May 12, 2026 at 8:42 PM Kevin Wu <[email protected]>
> > wrote:
> > > > > RE JS1: I like the idea of a separate `CONTROLER_ID_NOT_REGISTERED`
> > > error
> > > > > code for unregistering a controller which is not registered. I have
> > > > updated
> > > > > the KIP with this.
> > > > >
> > > > > RE JS2: Another case where reusing the ApiKey 1 metadata record may
> > not
> > > > be
> > > > > a good idea is for a combined node, where the broker and controller
> > > share
> > > > > the same node id. When the controller replays this record, should
> it
> > > > > unregister the broker or the controller? The answer is not super
> > > > obvious. I
> > > > > think the only way to distinguish between the broker and controller
> > > > > un-registrations is by looking if the `brokerEpoch` is set in the
> > > record,
> > > > > but that seems less intuitive than introducing a separate record. I
> > > have
> > > > > updated the KIP with this case too.
> > > >
> > > > Yes, we discussed both issues offline and decided to introduce a new
> > > > error for CONTROLER_ID_NOT_REGISTERED and a new unregistration
> > > > controller record. Because of the combined mode, a controller
> > > > registration and a broker registration can share the same node id.
> > > > Without a separate unregistration record, it is difficult to
> determine
> > > > during replay which entity is being unregistered.
> > > >
> > > > Thanks,
> > > > --
> > > > -José
> > > >
> > >
> >
>

Reply via email to