Hi Jun, Thanks for the reply.
RE JR1: "If an existing string can't be converted to uuid, we can fail the node. This shouldn't happen for a well formatted cluster, right?" Currently, you can format a cluster with a non-UUID cluster ID string, and kafka considers this "well-formatted" (i.e. formatting code accepts String, server startup works, and clusterId is a String in-memory etc.). Our documentation references formatting with a UUID cluster id generated via `kafka-storage random-uuid`, but this is not a requirement in the code. If we make this record have a UUID to be consistent with TopicRecord, it is not clear to me what the MV upgrade path is for existing clusters who formatted `meta.properties` with a non-UUID String. We have to write a new UUID cluster id, which violates the invariant that the cluster id cannot change over the lifetime of a cluster. RE JR6: I plan on still requiring bootstrap controllers to format. This means we should not expect a leader to be elected who does not have a cluster id. Bootstrap controllers will fail when reading in meta.properties in KafkaRaftServer. I will remove this section. RE JR7: Apologies, I mixed up the numbers with another KIP. RE JR8: For brokers, the readers of cluster id during startup are the BrokerLifecycleManager, KafkaApis, DynamicTopicClusterQuotaPublisher, and endpointReadyFutures. It is okay to block startup on fetching the cluster id from KRaft, since we also block startup on broker lifecycle manager initial catch up future. Discovering the cluster id value for the first time would only require a single FetchSnapshot or a Fetch of the bootstrap metadata records. For controllers, the readers are endpointReadyFutures, QuorumController, ControllerApis, ControllerRegistrationManager, and DynamicTopicClusterQuotaPublisher. For bootstrap controllers, this blocking does not occur. For observers, they are essentially brokers from the perspective of KRaft, so I think it is okay to block even the initialization of QuorumController until the cluster id is discovered. Just like with brokers, we only block for 1 successful Fetch/Fetch Snapshot loop until this data is known. One detail is that for auto-joining observers in kraft.version=1, they need to wait until they persist cluster id before they try to join the voter set. RE JR9.1: This can also mean the broker skipped formatting, and does not have a cluster id. In this case, it will persist cluster id to meta.properties. The other case is when the broker has a cluster.id in meta.properties. In this case, the broker cannot discover a different cluster id via a ClusterIdRecord in FetchResponse. In fact, the broker will not be able to successfully complete any KRaft RPCs against the leader. For the broker to receive a non-error FetchResponse with metadata records (which would be the only way to learn of a different ClusterIdRecord), the KRaft leader checks that the request cluster id is absent, or that the request cluster id matches its own (which is the cluster id in its meta.properties/ClusterIdRecord if the invariant I mentioned in my previous message is enforced properly). This case could happen when bootstrap endpoints point to the wrong cluster during restart of a node. The logic above would result in startup timing out and shutting down the node because the local node is not able to participate in KRaft for another cluster. RE JR9.2: Yes, the broker's startup will eventually timeout and fail. The broker won't have cluster.id in meta.properties, and the cluster cannot send the broker a cluster id via ClusterIdRecord. The same would apply for an observer controller. This is a misconfiguration in my opinion. On Tue, Mar 3, 2026 at 12:22 AM Jun Rao via dev <[email protected]> wrote: > Hi, Kevin, > > Thanks for the reply. > > JR1. ClusterIdRecord: > It would be better for ClusterId to have the type uuid. This will make it > consistent with topicId in TopicRecord. If an existing string can't be > converted to uuid, we can fail the node. This shouldn't happen for a well > formatted cluster, right? > > JR6. Have you decided what to include in this KIP? If this KIP still > requires the formatting for bootstrap controllers, what's described here > can't happen. > > JR7. "After KIP-1286, kafka operators no longer need to format all nodes" > KIP-1286 seems to be the wrong KIP? > > JR8. "The readers of cluster id initialized during startup can wait for > both the above before being initialized." > What are those readers? Are they ok to block? > > JR9. A couple more upgrade scenarios. > JR9.1 If the MV has been bumped, after a broker starts up, it discovers > that the clusterId in ClusterIdRecord doesn't match the one in > meta.properties. Will the broker fail? > JR9.2 If the MV hasn't been bumped, a new broker with the new version of > the software is started without formatting, will it fail during startup? > > Jun > > On Wed, Feb 18, 2026 at 8:49 AM Kevin Wu <[email protected]> wrote: > > > Hi Jun, > > > > Thanks for the replies and questions. > > > > RE JR1: Updated the KIP with the record schema for ClusterIdRecord. One > > thing I'm not sure about yet is whether or not the record field should be > > of UUID or String type. This is because kafka's quickstart docs refer to > > setting `--cluster-id` to a UUID in the storage tool. However, many > places > > in kafka broker/controller code (e.g. the raft client, broker lifecycle > > manager, and even the formatter itself) only require this type to be a > > String. Since not all Strings are valid UUIDs, making this record field > of > > type UUID might be too restrictive and complicate upgrading the MV for > > existing clusters, since they might have a non-UUID cluster id string, > but > > need to write this record when upgrading to an MV that supports this > > feature. Let me know what you think. > > > > RE JR2: Any controller node formatted with `--standalone, > > --initial-controllers` or who is part of the static voter set defined by > > `controller.quorum.voters` can write the ClusterIdRecord by including the > > `--cluster-id` argument to `kafka-storage format`. However, if the MV of > > the cluster supports it, there is exactly one writer of this record to > the > > cluster metadata partition. The writer is the first active controller, > who > > writes this record alongside other bootstrap metadata records (e.g. > > metadata version) during controller activation. At this point, we already > > depend on MV existing, since the active controller writes these bootstrap > > metadata records as a transaction if the MV supports it. I think writing > > the cluster id record would follow a similar pattern. > > > > RE JR3: When a node formats, it will write the meta.properties file. > During > > formatting, a node must resolve the MV it wants to format with, which is > > explained more in RE JR5. I need to think about this more, but I think we > > should keep `--cluster-id` as a required flag for invoking the format > > command. If a broker/observer controller does not format, meta.properties > > is written without cluster id immediately after startup (i.e. where we > read > > it from disk now in KafkaRaftServer). > > > > RE JR4: Yeah, will do. In this context, when I say observers I'm > referring > > to any controllers who are not part of the KRaft voter set when they > start > > kafka, or any brokers. I will make this explicit in the KIP. From the > > perspective of this feature and KRaft leader election, controller nodes > who > > format with `--no-initial-controllers`, controller nodes who are not part > > of `controller.quorum.voters`, and brokers, all do not "need" to format, > > since they cannot become the active controller. This means they can > resolve > > metadata like the cluster id after discovering the leader. We have a > > similar pattern with how controller nodes who format with > > `--no-initial-controllers` discover the kraft version of the cluster. > > > > RE JR5: If a node formats, it must resolve a metadata version with which > to > > format. This comes from the `--release-version/--feature` flag and > defaults > > to the latest production MV. Therefore, when a node formats with a > metadata > > version that supports this feature, it will write the ClusterIdRecord to > > its `0-0/bootstrap.checkpoint`. If the node formats with a metadata > version > > that does not support this feature, it does not write ClusterIdRecord to > > its `0-0/bootstrap.checkpoint`. If a node skips formatting, it is assumed > > that this node is part of a cluster whose MV supports this. Otherwise, > this > > is a misconfiguration and the node will fail to register with the leader > > since there is no way for it to persist cluster id to its meta.properties > > without formatting. > > > > Although I did not specify this yet on the KIP explicitly, after some > > offline discussion I think it makes sense to enforce the following > > invariant as part of the feature design: if the persisted metadata > version > > supports this feature, the ClusterId record must also be persisted. This > is > > enforceable on the write-path for MV, which occurs at two points-- during > > formatting and during feature upgrades. There is a similar pattern with > > kraft.version, as it gets written to disk at the same two points. > > > > RE JR6: The main motivation for writing cluster id to meta.properties as > > well is because it can act as a projection of the cluster metadata > > partition which essentially only exposes the cluster id to readers. For > > example, the raft layer needs to be aware of the cluster id for its own > RPC > > handling/validation, but raft cannot read metadata records. There are > many > > readers of this cluster id value during the startup of the cluster. > > Therefore, avoiding a read of the metadata partition to discover the > value > > of this metadata will prevent more complications of the startup code. > > > > Best, > > Kevin Wu > > > > > > On Tue, Feb 17, 2026 at 7:35 PM Jun Rao via dev <[email protected]> > > wrote: > > > > > Hi, Kevin, > > > > > > Thanks for the KIP. A few comments. > > > > > > JR1. ClusterIdRecord : Could you define the record format? > > > > > > JR2. "a new MetadataVersion that supports encoding/decoding this > record. > > > This means that during formatting, the bootstrap ClusterIdRecord is > only > > > written if the cluster is formatted with a MV that supports this > > feature." > > > Could you describe who writes the ClusterIdRecord? Is it the leader > > > controller? Also, when is the record written? Do we guarantee that MV > is > > > available at that time? > > > > > > JR3. "meta.properties can be written during kafka broker/controller > > startup > > > if it doesn't exist already (from formatting)" > > > Could you describe when meta.properties is written? Is MV available at > > that > > > time? > > > > > > JR4. "Introduce a metadata record for cluster id + observers persist > > > cluster id to meta.properties from metadata publishing pipeline" > > > Could you clarify what observers are? Are they observer controllers or > > are > > > they brokers (which are referred to as observers to the controller)? > > > > > > JR5. "Bootstrap controllers can add a mandatory “cluster id” record > > during > > > formatting" > > > This sounds like adding a ClusterIdRecord is optional. If so, could you > > > describe when a record will be added and when a record will not be > added? > > > > > > JR6. "However, kafka should still be able to handle the case where a > > leader > > > is elected without a cluster id in meta.properties , since KRaft does > not > > > need cluster.id in order to elect a leader. > > > In this case, the active controller will write a cluster id > > > record during the bootstrap metadata write." > > > Hmm, earlier, the KIP says "Upon discovering the cluster ID for the > first > > > time, these nodes need to persist this to meta.properties". Why do we > > need > > > to introduce a separate place to write the cluster id to > > meta.properties. > > > > > > Jun > > > > > > > > > On Wed, Feb 11, 2026 at 10:21 AM Kevin Wu <[email protected]> > > wrote: > > > > > > > Hi all, > > > > > > > > Manually bumping this thread after finalizing a design. > > > > KIP link: > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1262%3A+Enable+auto-formatting+directories > > > > > > > > Best, > > > > Kevin Wu > > > > > > > > On Tue, Jan 6, 2026 at 7:18 AM Kevin Wu <[email protected]> > > wrote: > > > > > > > > > Hello all, > > > > > > > > > > I would like to start a discussion on KIP-1262, which proposes > > removing > > > > > the formatting requirement for brokers and observer controllers. > > > > Currently, > > > > > I am considering two high-level designs, and would appreciate > > community > > > > > feedback on both approaches to decide on a final design. > > > > > > > > > > KIP link: > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1262%3A+Enable+auto-formatting+directories > > > > > > > > > > Best, > > > > > Kevin Wu > > > > > > > > > > > > > > >
