Thanks for the update. LGTM!

Luke

On Thu, Mar 12, 2026 at 6:59 PM Andrew Schofield <[email protected]>
wrote:

> Hi Luke,
> Thanks for your questions.
>
> 1. It's a good question. If we considered metadata.recovery.strategy=NONE
> and metadata.cluster.check.enable=true to be conflicting configurations, an
> application configuring the former prior to KIP-1242 would fail when it
> upgrades to a KIP-1242 client. This might be acceptable at a major version
> bump, but I don't think it is on a minor version bump. I think there is
> value in checking by default without waiting for AK 5.0, so I don't want to
> make the configurations conflict.
>
> I suggest that disabling the checking if metadata.recovery.strategy=NONE
> (non-default since AK 4.0) is the easiest path forwards. What do you think?
>
> 2. If the broker doesn't support ApiVersions v5 or later, it will have no
> effect. Experience of logging "helpful" information as part of KIP-714 is
> that it is actually soon considered annoying, so I propose to document that
> the cluster check will only have an effect when connecting to brokers that
> support ApiVersions v5 or later (which is hopefully AK 4.4) and not log
> anything.
>
>
> I have made updates to the KIP. Please take a look.
>
> Thanks,
> Andrew
>
> On 2026/03/12 06:03:02 Luke Chen wrote:
> > Hi Andrew,
> >
> > Thanks for the KIP!
> >
> > Questions:
> > 1. If the client doesn't enable (i.e. metadata.recovery.strategy=NONE),
> > what will happen if `metadata.cluster.check.enable=true`
> > and REBOOTSTRAP_REQUIRED error is received? Should we fail fast when
> users
> > use this config combination?
> >
> > 2. We set `metadata.cluster.check.enable` to true by default now. What
> > happens if the broker doesn't support the ApiVersions v5 or later? It
> > should have no effect, right? Should we document or log something about
> it?
> > Otherwise, the config will confuse users.
> >
> > Thank you,
> > Luke
> >
> > On Mon, Mar 9, 2026 at 8:55 AM Gaurav Narula <[email protected]> wrote:
> >
> > > Hi Andrew,
> > >
> > > Thank you for the KIP. I welcome the suggestion as I've run into a
> version
> > > of this problem in the past which I’d like to share for posterity.
> > >
> > > I've run into situations where requests to the controller sent via
> > > NodeToControllerChannelManagerImpl failed with authentication
> exceptions.
> > > On debugging it was found that NodeToControllerChannelManagerImpl
> cached
> > > the controller node whose advertised address had changed and the cached
> > > entry referred to a node in another cluster. My fix then was to propose
> > > https://github.com/apache/kafka/pull/14760 but there quite likely
> exists
> > > a gap for situations where users don’t face an auth exception (perhaps
> > > clusters share auth?) in the same code path. I believe this KIP should
> > > close that gap and allow for better error handling of such scenarios.
> > >
> > > Thanks once again!
> > >
> > > Regards,
> > > Gaurav
> > >
> > > > On 3 Mar 2026, at 10:52, Andrew Schofield <[email protected]>
> wrote:
> > > >
> > > > Thinking about this some more, I have changed the error code on
> receipt
> > > of an incorrect cluster ID to REBOOTSTRAP_REQUIRED, matching incorrect
> node
> > > ID. This is because I have heard of situations in which people use
> > > rebootstrapping to switch clusters for recovery purposes so it's
> important
> > > that a retriable error is used. Logging on client and server will
> indicate
> > > when the checks fail, so the KIP's aim of making misconfiguration
> diagnosis
> > > easier will be satisfied while making the clients tolerant of
> intentional
> > > changes which should drive rebootstrapping.
> > > >
> > > > Unless there are further comments, I will start voting on this KIP
> next
> > > week.
> > > >
> > > > Thanks,
> > > > Andrew
> > > >
> > > > On 2026/03/02 18:36:07 Rajini Sivaram wrote:
> > > >> Hi Andrew,
> > > >>
> > > >> Thanks for the update, looks good.
> > > >>
> > > >> Regards,
> > > >>
> > > >> Rajini
> > > >>
> > > >> On Mon, Mar 2, 2026 at 1:57 PM Andrew Schofield <
> [email protected]>
> > > >> wrote:
> > > >>
> > > >>> Hi Rajini,
> > > >>> Thanks for your comments.
> > > >>>
> > > >>> I have changed the KIP such that the client discards cluster ID and
> > > node
> > > >>> information when rebootstrapping begins.
> > > >>>
> > > >>> I have also added a common client configuration to disable sending
> of
> > > the
> > > >>> cluster ID and node ID information, just in case there is a
> situation
> > > in
> > > >>> which the assumptions behind this KIP do not apply to an existing
> > > >>> deployment.
> > > >>>
> > > >>> Thanks,
> > > >>> Andrew
> > > >>>
> > > >>> On 2026/03/02 12:09:46 Rajini Sivaram wrote:
> > > >>>> Hi Andrew,
> > > >>>>
> > > >>>> Thanks for the KIP.
> > > >>>>
> > > >>>> The KIP says:
> > > >>>> If the client is bootstrapping, it does not supply ClusterId  or
> > > NodeId .
> > > >>>> After bootstrapping, during which it learns the information from
> its
> > > >>> initial
> > > >>>> Metadata  response, it supplies both.
> > > >>>>
> > > >>>> It will be good to clarify the behaviour during re-bootstrapping.
> We
> > > >>> clear
> > > >>>> the current metadata during re-bootstrap and revert to bootstrap
> > > >>> metadata.
> > > >>>> At this point, we don't retain node ids or cluster id from
> previous
> > > >>>> metadata responses. I think this makes sense because we want
> > > >>>> re-bootstrapping to behave in the same way as the first
> bootstrap. If
> > > we
> > > >>>> retain this behaviour, validation of cluster id and node-id will
> be
> > > based
> > > >>>> on the Metadata response of the last bootstrap, which is not
> > > necessarily
> > > >>>> the initial Metadata response. I think this is the desired
> behaviour,
> > > can
> > > >>>> we clarify in the KIP?
> > > >>>>
> > > >>>> Kafka clients have always supported cluster id change without
> > > requiring
> > > >>>> restart. Do we need an opt-out in case some deployments rely on
> this
> > > >>>> feature? If re-bootstrapping is enabled, clients would
> re-bootstrap if
> > > >>>> connections consistently fail. So as long as we continue to clear
> old
> > > >>>> metadata on re-bootstrap, we should be fine. Not sure if we need
> an
> > > >>>> explicit opt-out for the case where re-bootstrapping is disabled.
> > > >>>>
> > > >>>> Thanks,
> > > >>>>
> > > >>>> Rajini
> > > >>>>
> > > >>>>
> > > >>>> On Thu, Feb 12, 2026 at 1:43 PM Andrew Schofield <
> > > [email protected]>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Hi David,
> > > >>>>> Thanks for your question.
> > > >>>>>
> > > >>>>> Here's one elderly JIRA I've unearthed which is related
> > > >>>>> https://issues.apache.org/jira/browse/KAFKA-15828.
> > > >>>>>
> > > >>>>> I am also aware of suspected problems in the networking for cloud
> > > >>>>> providers which occasionally seem to route connections to the
> wrong
> > > >>> place.
> > > >>>>>
> > > >>>>> The KIP is aiming to get some basic diagnosis and recovery into
> the
> > > >>> Kafka
> > > >>>>> protocol where today there is none. As you can imagine, there is
> > > total
> > > >>>>> mayhem when a client confidently thinks it's talking to one
> broker
> > > when
> > > >>>>> actually it's talking to quite another. Diagnosis of this kind of
> > > >>> problem
> > > >>>>> would really help in getting to the bottom of rare issues such as
> > > >>> these.
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Andrew
> > > >>>>>
> > > >>>>> On 2026/02/11 16:12:50 David Arthur wrote:
> > > >>>>>> Thanks for the KIP, Andrew. I'm all for making the client more
> > > robust
> > > >>>>>> against networking and deployment weirdness
> > > >>>>>>
> > > >>>>>> I'm not sure I fully grok the scenario you are covering here. It
> > > >>> sounds
> > > >>>>>> like you're guarding against a hostname being reused by a
> different
> > > >>>>> broker.
> > > >>>>>> Does the client not learn about the new broker hostnames when it
> > > >>>>> refreshes
> > > >>>>>> metadata periodically?
> > > >>>>>>
> > > >>>>>> -David
> > > >>>>>>
> > > >>>>>> On Thu, Nov 20, 2025 at 5:59 AM Andrew Schofield <
> > > >>>>> [email protected]>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>> Hi,
> > > >>>>>>> I would like to start discussion of a new KIP for detecting and
> > > >>>>> handling
> > > >>>>>>> misrouted connections from Kafka clients. The Kafka protocol
> does
> > > >>> not
> > > >>>>>>> contain any information for working out when the broker
> metadata
> > > >>>>>>> information in a client is inconsistent or stale. This KIP
> > > >>> proposes a
> > > >>>>> way
> > > >>>>>>> to address this.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1242%3A+Detection+and+handling+of+misrouted+connections
> > > >>>>>>>
> > > >>>>>>> Thanks,
> > > >>>>>>> Andrew
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> --
> > > >>>>>> David Arthur
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Reply via email to