Hi Kevin,

Thanks for the response.

LC2: OK, that matches what I thought. I think it's fine because in ZK's
era, the brokers also rely on the `zookeeper.connect` config to get the all
cluster metadata. It seems it never caused brokers to connect to the wrong
ZK cluster.

LC4: I see. Thanks.

Luke

On Fri, Mar 13, 2026 at 2:54 PM Kevin Wu <[email protected]> wrote:

> Hi Luke,
>
> Thanks for the reply.
>
> RE LC1 + LC3: Sure, I will update the KIP.
>
> RE LC2: Yes, when a node does not have a cluster.id set, it must discover
> it via Fetch. This means that a node can talk to the "wrong" cluster at
> this point. However, upon discovering and persisting a cluster.id, a node
> will not be able to successfully fetch from a KRaft leader with a different
> cluster.id. The only way to prevent this case would be to format the
> node with a cluster id.
>
> RE LC4: It is not possible for meta.properties V1 to exist without a
> cluster.id. Look at MetaProperties.Builder#build() and
> MetaPropertiesEnsemble#verify().
>
> Best,
> Kevin Wu
>
> On Thu, Mar 12, 2026 at 9:48 AM Luke Chen <[email protected]> wrote:
>
> > Hi Kevin,
> >
> > Thanks for the KIP.
> >
> > Comments:
> > LC1. It's good if we can display the schema of meta.properties v2, and
> > what's the difference from v1, like we showed for API change.
> >
> > LC2. Before this KIP, the formatting nodes with the correct cluster ID
> > makes sure the brokers/observer controllers will talk to the expected
> > controllers. But after this KIP, it's possible that the brokers/observer
> > controllers without formatting could connect to the wrong cluster
> > controllers, and get discover the cluster ID from fetch response and
> > persist it. Is this correct? Looks like this is the risk users need to
> take
> > if they don't want to format them?
> >
> > LC3. "If this ID doesn't match the KRaft leader's the leader will reject
> > requests from the node"
> > There is a missing word in the sentence.
> >
> > LC4. "If meta.properties exists without a cluster.id and is V2, it will
> be
> > discovered later (described below)"
> > Is it possible the meta.properties exists without a cluster.id but it's
> in
> > V1? If so, how will we handle it?
> >
> > Thank you,
> > Luke
> >
> > On Wed, Mar 4, 2026 at 2:39 AM Jun Rao via dev <[email protected]>
> > wrote:
> >
> > > Hi, Kevin,
> > >
> > > Thanks for the reply. The KIP looks good to me now.
> > >
> > > Jun
> > >
> > > On Tue, Mar 3, 2026 at 4:54 AM Kevin Wu <[email protected]>
> wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > RE JR1: "If an existing string can't be converted to uuid, we can
> fail
> > > the
> > > > node. This shouldn't happen for a well formatted cluster, right?"
> > > > Currently, you can format a cluster with a non-UUID cluster ID
> string,
> > > and
> > > > kafka considers this "well-formatted" (i.e. formatting code accepts
> > > String,
> > > > server startup works, and clusterId is a String in-memory etc.). Our
> > > > documentation references formatting with a UUID cluster id generated
> > via
> > > > `kafka-storage random-uuid`, but this is not a requirement in the
> code.
> > > If
> > > > we make this record have a UUID to be consistent with TopicRecord, it
> > is
> > > > not clear to me what the MV upgrade path is for existing clusters who
> > > > formatted `meta.properties` with a non-UUID String. We have to write
> a
> > > new
> > > > UUID cluster id, which violates the invariant that the cluster id
> > cannot
> > > > change over the lifetime of a cluster.
> > > >
> > > > RE JR6: I plan on still requiring bootstrap controllers to format.
> This
> > > > means we should not expect a leader to be elected who does not have a
> > > > cluster id. Bootstrap controllers will fail when reading in
> > > meta.properties
> > > > in KafkaRaftServer. I will remove this section.
> > > >
> > > > RE JR7: Apologies, I mixed up the numbers with another KIP.
> > > >
> > > > RE JR8:
> > > > For brokers, the readers of cluster id during startup are the
> > > > BrokerLifecycleManager, KafkaApis, DynamicTopicClusterQuotaPublisher,
> > and
> > > > endpointReadyFutures. It is okay to block startup on fetching the
> > cluster
> > > > id from KRaft, since we also block startup on broker lifecycle
> manager
> > > > initial catch up future. Discovering the cluster id value for the
> first
> > > > time would only require a single FetchSnapshot or a Fetch of the
> > > bootstrap
> > > > metadata records.
> > > >
> > > > For controllers, the readers are endpointReadyFutures,
> > > > QuorumController, ControllerApis, ControllerRegistrationManager, and
> > > > DynamicTopicClusterQuotaPublisher. For bootstrap controllers, this
> > > blocking
> > > > does not occur. For observers, they are essentially brokers from the
> > > > perspective of KRaft, so I think it is okay to block even the
> > > > initialization of QuorumController until the cluster id is
> discovered.
> > > Just
> > > > like with brokers, we only block for 1 successful Fetch/Fetch
> Snapshot
> > > loop
> > > > until this data is known. One detail is that for auto-joining
> observers
> > > in
> > > > kraft.version=1, they need to wait until they persist cluster id
> before
> > > > they try to join the voter set.
> > > >
> > > > RE JR9.1:
> > > > This can also mean the broker skipped formatting, and does not have a
> > > > cluster id. In this case, it will persist cluster id to
> > meta.properties.
> > > >
> > > > The other case is when the broker has a cluster.id in
> meta.properties.
> > > In
> > > > this case, the broker cannot discover a different cluster id via a
> > > > ClusterIdRecord in FetchResponse. In fact, the broker will not be
> able
> > to
> > > > successfully complete any KRaft RPCs against the leader. For the
> broker
> > > to
> > > > receive a non-error FetchResponse with metadata records (which would
> be
> > > the
> > > > only way to learn of a different ClusterIdRecord), the KRaft leader
> > > checks
> > > > that the request cluster id is absent, or that the request cluster id
> > > > matches its own (which is the cluster id in its
> > > > meta.properties/ClusterIdRecord if the invariant I mentioned in my
> > > previous
> > > > message is enforced properly). This case could happen when bootstrap
> > > > endpoints point to the wrong cluster during restart of a node. The
> > logic
> > > > above would result in startup timing out and shutting down the node
> > > because
> > > > the local node is not able to participate in KRaft for another
> cluster.
> > > >
> > > > RE JR9.2: Yes, the broker's startup will eventually timeout and fail.
> > The
> > > > broker won't have cluster.id in meta.properties, and the cluster
> > cannot
> > > > send the broker a cluster id via ClusterIdRecord. The same would
> apply
> > > for
> > > > an observer controller. This is a misconfiguration in my opinion.
> > > >
> > > > On Tue, Mar 3, 2026 at 12:22 AM Jun Rao via dev <
> [email protected]>
> > > > wrote:
> > > >
> > > > > Hi, Kevin,
> > > > >
> > > > > Thanks for the reply.
> > > > >
> > > > > JR1. ClusterIdRecord:
> > > > > It would be better for ClusterId to have the type uuid. This will
> > make
> > > it
> > > > > consistent with topicId in TopicRecord. If an existing string can't
> > be
> > > > > converted to uuid, we can fail the node. This shouldn't happen for
> a
> > > well
> > > > > formatted cluster, right?
> > > > >
> > > > > JR6. Have you decided what to include in this KIP? If this KIP
> still
> > > > > requires the formatting for bootstrap controllers, what's described
> > > here
> > > > > can't happen.
> > > > >
> > > > > JR7. "After KIP-1286, kafka operators no longer need to format all
> > > nodes"
> > > > > KIP-1286 seems to be the wrong KIP?
> > > > >
> > > > > JR8. "The readers of cluster id initialized during startup can wait
> > for
> > > > > both the above before being initialized."
> > > > > What are those readers? Are they ok to block?
> > > > >
> > > > > JR9. A couple more upgrade scenarios.
> > > > > JR9.1 If the MV has been bumped, after a broker starts up, it
> > discovers
> > > > > that the clusterId in ClusterIdRecord doesn't match the one in
> > > > > meta.properties. Will the broker fail?
> > > > > JR9.2 If the MV hasn't been bumped, a new broker with the new
> version
> > > of
> > > > > the software is started without formatting, will it fail during
> > > startup?
> > > > >
> > > > > Jun
> > > > >
> > > > > On Wed, Feb 18, 2026 at 8:49 AM Kevin Wu <[email protected]>
> > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Thanks for the replies and questions.
> > > > > >
> > > > > > RE JR1: Updated the KIP with the record schema for
> ClusterIdRecord.
> > > One
> > > > > > thing I'm not sure about yet is whether or not the record field
> > > should
> > > > be
> > > > > > of UUID or String type. This is because kafka's quickstart docs
> > refer
> > > > to
> > > > > > setting `--cluster-id` to a UUID in the storage tool. However,
> many
> > > > > places
> > > > > > in kafka broker/controller code (e.g. the raft client, broker
> > > lifecycle
> > > > > > manager, and even the formatter itself) only require this type to
> > be
> > > a
> > > > > > String. Since not all Strings are valid UUIDs, making this record
> > > field
> > > > > of
> > > > > > type UUID might be too restrictive and complicate upgrading the
> MV
> > > for
> > > > > > existing clusters, since they might have a non-UUID cluster id
> > > string,
> > > > > but
> > > > > > need to write this record when upgrading to an MV that supports
> > this
> > > > > > feature. Let me know what you think.
> > > > > >
> > > > > > RE JR2: Any controller node formatted with `--standalone,
> > > > > > --initial-controllers` or who is part of the static voter set
> > defined
> > > > by
> > > > > > `controller.quorum.voters` can write the ClusterIdRecord by
> > including
> > > > the
> > > > > > `--cluster-id` argument to `kafka-storage format`. However, if
> the
> > MV
> > > > of
> > > > > > the cluster supports it, there is exactly one writer of this
> record
> > > to
> > > > > the
> > > > > > cluster metadata partition. The writer is the first active
> > > controller,
> > > > > who
> > > > > > writes this record alongside other bootstrap metadata records
> (e.g.
> > > > > > metadata version) during controller activation. At this point, we
> > > > already
> > > > > > depend on MV existing, since the active controller writes these
> > > > bootstrap
> > > > > > metadata records as a transaction if the MV supports it. I think
> > > > writing
> > > > > > the cluster id record would follow a similar pattern.
> > > > > >
> > > > > > RE JR3: When a node formats, it will write the meta.properties
> > file.
> > > > > During
> > > > > > formatting, a node must resolve the MV it wants to format with,
> > which
> > > > is
> > > > > > explained more in RE JR5. I need to think about this more, but I
> > > think
> > > > we
> > > > > > should keep `--cluster-id` as a required flag for invoking the
> > format
> > > > > > command. If a broker/observer controller does not format,
> > > > meta.properties
> > > > > > is written without cluster id immediately after startup (i.e.
> where
> > > we
> > > > > read
> > > > > > it from disk now in KafkaRaftServer).
> > > > > >
> > > > > > RE JR4: Yeah, will do. In this context, when I say observers I'm
> > > > > referring
> > > > > > to any controllers who are not part of the KRaft voter set when
> > they
> > > > > start
> > > > > > kafka, or any brokers. I will make this explicit in the KIP. From
> > the
> > > > > > perspective of this feature and KRaft leader election, controller
> > > nodes
> > > > > who
> > > > > > format with `--no-initial-controllers`, controller nodes who are
> > not
> > > > part
> > > > > > of `controller.quorum.voters`, and brokers, all do not "need" to
> > > > format,
> > > > > > since they cannot become the active controller. This means they
> can
> > > > > resolve
> > > > > > metadata like the cluster id after discovering the leader. We
> have
> > a
> > > > > > similar pattern with how controller nodes who format with
> > > > > > `--no-initial-controllers` discover the kraft version of the
> > cluster.
> > > > > >
> > > > > > RE JR5: If a node formats, it must resolve a metadata version
> with
> > > > which
> > > > > to
> > > > > > format. This comes from the `--release-version/--feature` flag
> and
> > > > > defaults
> > > > > > to the latest production MV. Therefore, when a node formats with
> a
> > > > > metadata
> > > > > > version that supports this feature, it will write the
> > ClusterIdRecord
> > > > to
> > > > > > its `0-0/bootstrap.checkpoint`. If the node formats with a
> metadata
> > > > > version
> > > > > > that does not support this feature, it does not write
> > ClusterIdRecord
> > > > to
> > > > > > its `0-0/bootstrap.checkpoint`. If a node skips formatting, it is
> > > > assumed
> > > > > > that this node is part of a cluster whose MV supports this.
> > > Otherwise,
> > > > > this
> > > > > > is a misconfiguration and the node will fail to register with the
> > > > leader
> > > > > > since there is no way for it to persist cluster id to its
> > > > meta.properties
> > > > > > without formatting.
> > > > > >
> > > > > > Although I did not specify this yet on the KIP explicitly, after
> > some
> > > > > > offline discussion I think it makes sense to enforce the
> following
> > > > > > invariant as part of the feature design: if the persisted
> metadata
> > > > > version
> > > > > > supports this feature, the ClusterId record must also be
> persisted.
> > > > This
> > > > > is
> > > > > > enforceable on the write-path for MV, which occurs at two
> points--
> > > > during
> > > > > > formatting and during feature upgrades. There is a similar
> pattern
> > > with
> > > > > > kraft.version, as it gets written to disk at the same two points.
> > > > > >
> > > > > > RE JR6: The main motivation for writing cluster id to
> > meta.properties
> > > > as
> > > > > > well is because it can act as a projection of the cluster
> metadata
> > > > > > partition which essentially only exposes the cluster id to
> readers.
> > > For
> > > > > > example, the raft layer needs to be aware of the cluster id for
> its
> > > own
> > > > > RPC
> > > > > > handling/validation, but raft cannot read metadata records. There
> > are
> > > > > many
> > > > > > readers of this cluster id value during the startup of the
> cluster.
> > > > > > Therefore, avoiding a read of the metadata partition to discover
> > the
> > > > > value
> > > > > > of this metadata will prevent more complications of the startup
> > code.
> > > > > >
> > > > > > Best,
> > > > > > Kevin Wu
> > > > > >
> > > > > >
> > > > > > On Tue, Feb 17, 2026 at 7:35 PM Jun Rao via dev <
> > > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi, Kevin,
> > > > > > >
> > > > > > > Thanks for the KIP. A few comments.
> > > > > > >
> > > > > > > JR1. ClusterIdRecord : Could you define the record format?
> > > > > > >
> > > > > > > JR2. "a new MetadataVersion that supports encoding/decoding
> this
> > > > > record.
> > > > > > > This means that during formatting, the bootstrap
> ClusterIdRecord
> > is
> > > > > only
> > > > > > > written if the cluster is formatted with a MV that supports
> this
> > > > > > feature."
> > > > > > > Could you describe who writes the ClusterIdRecord? Is it the
> > leader
> > > > > > > controller? Also, when is the record written? Do we guarantee
> > that
> > > MV
> > > > > is
> > > > > > > available at that time?
> > > > > > >
> > > > > > > JR3. "meta.properties can be written during kafka
> > broker/controller
> > > > > > startup
> > > > > > > if it doesn't exist already (from formatting)"
> > > > > > > Could you describe when meta.properties is written? Is MV
> > available
> > > > at
> > > > > > that
> > > > > > > time?
> > > > > > >
> > > > > > > JR4. "Introduce a metadata record for cluster id + observers
> > > persist
> > > > > > > cluster id to meta.properties from metadata publishing
> pipeline"
> > > > > > > Could you clarify what observers are? Are they observer
> > controllers
> > > > or
> > > > > > are
> > > > > > > they brokers (which are referred to as observers to the
> > > controller)?
> > > > > > >
> > > > > > > JR5. "Bootstrap controllers can add a mandatory “cluster id”
> > record
> > > > > > during
> > > > > > > formatting"
> > > > > > > This sounds like adding a ClusterIdRecord is optional. If so,
> > could
> > > > you
> > > > > > > describe when a record will be added and when a record will not
> > be
> > > > > added?
> > > > > > >
> > > > > > > JR6. "However, kafka should still be able to handle the case
> > where
> > > a
> > > > > > leader
> > > > > > > is elected without a cluster id in meta.properties , since
> KRaft
> > > does
> > > > > not
> > > > > > > need cluster.id  in order to elect a leader.
> > > > > > >           In this case, the active controller will write a
> > cluster
> > > id
> > > > > > > record during the bootstrap metadata write."
> > > > > > > Hmm, earlier, the KIP says "Upon discovering the cluster ID for
> > the
> > > > > first
> > > > > > > time, these nodes need to persist this to meta.properties". Why
> > do
> > > we
> > > > > > need
> > > > > > > to introduce a separate place to write the cluster id to
> > > > > > meta.properties.
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Feb 11, 2026 at 10:21 AM Kevin Wu <
> > [email protected]>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > Manually bumping this thread after finalizing a design.
> > > > > > > > KIP link:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1262%3A+Enable+auto-formatting+directories
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Kevin Wu
> > > > > > > >
> > > > > > > > On Tue, Jan 6, 2026 at 7:18 AM Kevin Wu <
> > [email protected]>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hello all,
> > > > > > > > >
> > > > > > > > > I would like to start a discussion on KIP-1262, which
> > proposes
> > > > > > removing
> > > > > > > > > the formatting requirement for brokers and observer
> > > controllers.
> > > > > > > > Currently,
> > > > > > > > > I am considering two high-level designs, and would
> appreciate
> > > > > > community
> > > > > > > > > feedback on both approaches to decide on a final design.
> > > > > > > > >
> > > > > > > > > KIP link:
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1262%3A+Enable+auto-formatting+directories
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Kevin Wu
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to