Thanks for the update. LGTM! Luke
On Thu, Mar 12, 2026 at 6:59 PM Andrew Schofield <[email protected]> wrote: > Hi Luke, > Thanks for your questions. > > 1. It's a good question. If we considered metadata.recovery.strategy=NONE > and metadata.cluster.check.enable=true to be conflicting configurations, an > application configuring the former prior to KIP-1242 would fail when it > upgrades to a KIP-1242 client. This might be acceptable at a major version > bump, but I don't think it is on a minor version bump. I think there is > value in checking by default without waiting for AK 5.0, so I don't want to > make the configurations conflict. > > I suggest that disabling the checking if metadata.recovery.strategy=NONE > (non-default since AK 4.0) is the easiest path forwards. What do you think? > > 2. If the broker doesn't support ApiVersions v5 or later, it will have no > effect. Experience of logging "helpful" information as part of KIP-714 is > that it is actually soon considered annoying, so I propose to document that > the cluster check will only have an effect when connecting to brokers that > support ApiVersions v5 or later (which is hopefully AK 4.4) and not log > anything. > > > I have made updates to the KIP. Please take a look. > > Thanks, > Andrew > > On 2026/03/12 06:03:02 Luke Chen wrote: > > Hi Andrew, > > > > Thanks for the KIP! > > > > Questions: > > 1. If the client doesn't enable (i.e. metadata.recovery.strategy=NONE), > > what will happen if `metadata.cluster.check.enable=true` > > and REBOOTSTRAP_REQUIRED error is received? Should we fail fast when > users > > use this config combination? > > > > 2. We set `metadata.cluster.check.enable` to true by default now. What > > happens if the broker doesn't support the ApiVersions v5 or later? It > > should have no effect, right? Should we document or log something about > it? > > Otherwise, the config will confuse users. > > > > Thank you, > > Luke > > > > On Mon, Mar 9, 2026 at 8:55 AM Gaurav Narula <[email protected]> wrote: > > > > > Hi Andrew, > > > > > > Thank you for the KIP. I welcome the suggestion as I've run into a > version > > > of this problem in the past which I’d like to share for posterity. > > > > > > I've run into situations where requests to the controller sent via > > > NodeToControllerChannelManagerImpl failed with authentication > exceptions. > > > On debugging it was found that NodeToControllerChannelManagerImpl > cached > > > the controller node whose advertised address had changed and the cached > > > entry referred to a node in another cluster. My fix then was to propose > > > https://github.com/apache/kafka/pull/14760 but there quite likely > exists > > > a gap for situations where users don’t face an auth exception (perhaps > > > clusters share auth?) in the same code path. I believe this KIP should > > > close that gap and allow for better error handling of such scenarios. > > > > > > Thanks once again! > > > > > > Regards, > > > Gaurav > > > > > > > On 3 Mar 2026, at 10:52, Andrew Schofield <[email protected]> > wrote: > > > > > > > > Thinking about this some more, I have changed the error code on > receipt > > > of an incorrect cluster ID to REBOOTSTRAP_REQUIRED, matching incorrect > node > > > ID. This is because I have heard of situations in which people use > > > rebootstrapping to switch clusters for recovery purposes so it's > important > > > that a retriable error is used. Logging on client and server will > indicate > > > when the checks fail, so the KIP's aim of making misconfiguration > diagnosis > > > easier will be satisfied while making the clients tolerant of > intentional > > > changes which should drive rebootstrapping. > > > > > > > > Unless there are further comments, I will start voting on this KIP > next > > > week. > > > > > > > > Thanks, > > > > Andrew > > > > > > > > On 2026/03/02 18:36:07 Rajini Sivaram wrote: > > > >> Hi Andrew, > > > >> > > > >> Thanks for the update, looks good. > > > >> > > > >> Regards, > > > >> > > > >> Rajini > > > >> > > > >> On Mon, Mar 2, 2026 at 1:57 PM Andrew Schofield < > [email protected]> > > > >> wrote: > > > >> > > > >>> Hi Rajini, > > > >>> Thanks for your comments. > > > >>> > > > >>> I have changed the KIP such that the client discards cluster ID and > > > node > > > >>> information when rebootstrapping begins. > > > >>> > > > >>> I have also added a common client configuration to disable sending > of > > > the > > > >>> cluster ID and node ID information, just in case there is a > situation > > > in > > > >>> which the assumptions behind this KIP do not apply to an existing > > > >>> deployment. > > > >>> > > > >>> Thanks, > > > >>> Andrew > > > >>> > > > >>> On 2026/03/02 12:09:46 Rajini Sivaram wrote: > > > >>>> Hi Andrew, > > > >>>> > > > >>>> Thanks for the KIP. > > > >>>> > > > >>>> The KIP says: > > > >>>> If the client is bootstrapping, it does not supply ClusterId or > > > NodeId . > > > >>>> After bootstrapping, during which it learns the information from > its > > > >>> initial > > > >>>> Metadata response, it supplies both. > > > >>>> > > > >>>> It will be good to clarify the behaviour during re-bootstrapping. > We > > > >>> clear > > > >>>> the current metadata during re-bootstrap and revert to bootstrap > > > >>> metadata. > > > >>>> At this point, we don't retain node ids or cluster id from > previous > > > >>>> metadata responses. I think this makes sense because we want > > > >>>> re-bootstrapping to behave in the same way as the first > bootstrap. If > > > we > > > >>>> retain this behaviour, validation of cluster id and node-id will > be > > > based > > > >>>> on the Metadata response of the last bootstrap, which is not > > > necessarily > > > >>>> the initial Metadata response. I think this is the desired > behaviour, > > > can > > > >>>> we clarify in the KIP? > > > >>>> > > > >>>> Kafka clients have always supported cluster id change without > > > requiring > > > >>>> restart. Do we need an opt-out in case some deployments rely on > this > > > >>>> feature? If re-bootstrapping is enabled, clients would > re-bootstrap if > > > >>>> connections consistently fail. So as long as we continue to clear > old > > > >>>> metadata on re-bootstrap, we should be fine. Not sure if we need > an > > > >>>> explicit opt-out for the case where re-bootstrapping is disabled. > > > >>>> > > > >>>> Thanks, > > > >>>> > > > >>>> Rajini > > > >>>> > > > >>>> > > > >>>> On Thu, Feb 12, 2026 at 1:43 PM Andrew Schofield < > > > [email protected]> > > > >>>> wrote: > > > >>>> > > > >>>>> Hi David, > > > >>>>> Thanks for your question. > > > >>>>> > > > >>>>> Here's one elderly JIRA I've unearthed which is related > > > >>>>> https://issues.apache.org/jira/browse/KAFKA-15828. > > > >>>>> > > > >>>>> I am also aware of suspected problems in the networking for cloud > > > >>>>> providers which occasionally seem to route connections to the > wrong > > > >>> place. > > > >>>>> > > > >>>>> The KIP is aiming to get some basic diagnosis and recovery into > the > > > >>> Kafka > > > >>>>> protocol where today there is none. As you can imagine, there is > > > total > > > >>>>> mayhem when a client confidently thinks it's talking to one > broker > > > when > > > >>>>> actually it's talking to quite another. Diagnosis of this kind of > > > >>> problem > > > >>>>> would really help in getting to the bottom of rare issues such as > > > >>> these. > > > >>>>> > > > >>>>> Thanks, > > > >>>>> Andrew > > > >>>>> > > > >>>>> On 2026/02/11 16:12:50 David Arthur wrote: > > > >>>>>> Thanks for the KIP, Andrew. I'm all for making the client more > > > robust > > > >>>>>> against networking and deployment weirdness > > > >>>>>> > > > >>>>>> I'm not sure I fully grok the scenario you are covering here. It > > > >>> sounds > > > >>>>>> like you're guarding against a hostname being reused by a > different > > > >>>>> broker. > > > >>>>>> Does the client not learn about the new broker hostnames when it > > > >>>>> refreshes > > > >>>>>> metadata periodically? > > > >>>>>> > > > >>>>>> -David > > > >>>>>> > > > >>>>>> On Thu, Nov 20, 2025 at 5:59 AM Andrew Schofield < > > > >>>>> [email protected]> > > > >>>>>> wrote: > > > >>>>>> > > > >>>>>>> Hi, > > > >>>>>>> I would like to start discussion of a new KIP for detecting and > > > >>>>> handling > > > >>>>>>> misrouted connections from Kafka clients. The Kafka protocol > does > > > >>> not > > > >>>>>>> contain any information for working out when the broker > metadata > > > >>>>>>> information in a client is inconsistent or stale. This KIP > > > >>> proposes a > > > >>>>> way > > > >>>>>>> to address this. > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>> > > > >>> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1242%3A+Detection+and+handling+of+misrouted+connections > > > >>>>>>> > > > >>>>>>> Thanks, > > > >>>>>>> Andrew > > > >>>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> -- > > > >>>>>> David Arthur > > > >>>>>> > > > >>>>> > > > >>>> > > > >>> > > > >> > > > > > > > > >
