Hi, Kevin, Thanks for the reply. The KIP looks good to me now.
Jun On Tue, Mar 3, 2026 at 4:54 AM Kevin Wu <[email protected]> wrote: > Hi Jun, > > Thanks for the reply. > > RE JR1: "If an existing string can't be converted to uuid, we can fail the > node. This shouldn't happen for a well formatted cluster, right?" > Currently, you can format a cluster with a non-UUID cluster ID string, and > kafka considers this "well-formatted" (i.e. formatting code accepts String, > server startup works, and clusterId is a String in-memory etc.). Our > documentation references formatting with a UUID cluster id generated via > `kafka-storage random-uuid`, but this is not a requirement in the code. If > we make this record have a UUID to be consistent with TopicRecord, it is > not clear to me what the MV upgrade path is for existing clusters who > formatted `meta.properties` with a non-UUID String. We have to write a new > UUID cluster id, which violates the invariant that the cluster id cannot > change over the lifetime of a cluster. > > RE JR6: I plan on still requiring bootstrap controllers to format. This > means we should not expect a leader to be elected who does not have a > cluster id. Bootstrap controllers will fail when reading in meta.properties > in KafkaRaftServer. I will remove this section. > > RE JR7: Apologies, I mixed up the numbers with another KIP. > > RE JR8: > For brokers, the readers of cluster id during startup are the > BrokerLifecycleManager, KafkaApis, DynamicTopicClusterQuotaPublisher, and > endpointReadyFutures. It is okay to block startup on fetching the cluster > id from KRaft, since we also block startup on broker lifecycle manager > initial catch up future. Discovering the cluster id value for the first > time would only require a single FetchSnapshot or a Fetch of the bootstrap > metadata records. > > For controllers, the readers are endpointReadyFutures, > QuorumController, ControllerApis, ControllerRegistrationManager, and > DynamicTopicClusterQuotaPublisher. For bootstrap controllers, this blocking > does not occur. For observers, they are essentially brokers from the > perspective of KRaft, so I think it is okay to block even the > initialization of QuorumController until the cluster id is discovered. Just > like with brokers, we only block for 1 successful Fetch/Fetch Snapshot loop > until this data is known. One detail is that for auto-joining observers in > kraft.version=1, they need to wait until they persist cluster id before > they try to join the voter set. > > RE JR9.1: > This can also mean the broker skipped formatting, and does not have a > cluster id. In this case, it will persist cluster id to meta.properties. > > The other case is when the broker has a cluster.id in meta.properties. In > this case, the broker cannot discover a different cluster id via a > ClusterIdRecord in FetchResponse. In fact, the broker will not be able to > successfully complete any KRaft RPCs against the leader. For the broker to > receive a non-error FetchResponse with metadata records (which would be the > only way to learn of a different ClusterIdRecord), the KRaft leader checks > that the request cluster id is absent, or that the request cluster id > matches its own (which is the cluster id in its > meta.properties/ClusterIdRecord if the invariant I mentioned in my previous > message is enforced properly). This case could happen when bootstrap > endpoints point to the wrong cluster during restart of a node. The logic > above would result in startup timing out and shutting down the node because > the local node is not able to participate in KRaft for another cluster. > > RE JR9.2: Yes, the broker's startup will eventually timeout and fail. The > broker won't have cluster.id in meta.properties, and the cluster cannot > send the broker a cluster id via ClusterIdRecord. The same would apply for > an observer controller. This is a misconfiguration in my opinion. > > On Tue, Mar 3, 2026 at 12:22 AM Jun Rao via dev <[email protected]> > wrote: > > > Hi, Kevin, > > > > Thanks for the reply. > > > > JR1. ClusterIdRecord: > > It would be better for ClusterId to have the type uuid. This will make it > > consistent with topicId in TopicRecord. If an existing string can't be > > converted to uuid, we can fail the node. This shouldn't happen for a well > > formatted cluster, right? > > > > JR6. Have you decided what to include in this KIP? If this KIP still > > requires the formatting for bootstrap controllers, what's described here > > can't happen. > > > > JR7. "After KIP-1286, kafka operators no longer need to format all nodes" > > KIP-1286 seems to be the wrong KIP? > > > > JR8. "The readers of cluster id initialized during startup can wait for > > both the above before being initialized." > > What are those readers? Are they ok to block? > > > > JR9. A couple more upgrade scenarios. > > JR9.1 If the MV has been bumped, after a broker starts up, it discovers > > that the clusterId in ClusterIdRecord doesn't match the one in > > meta.properties. Will the broker fail? > > JR9.2 If the MV hasn't been bumped, a new broker with the new version of > > the software is started without formatting, will it fail during startup? > > > > Jun > > > > On Wed, Feb 18, 2026 at 8:49 AM Kevin Wu <[email protected]> wrote: > > > > > Hi Jun, > > > > > > Thanks for the replies and questions. > > > > > > RE JR1: Updated the KIP with the record schema for ClusterIdRecord. One > > > thing I'm not sure about yet is whether or not the record field should > be > > > of UUID or String type. This is because kafka's quickstart docs refer > to > > > setting `--cluster-id` to a UUID in the storage tool. However, many > > places > > > in kafka broker/controller code (e.g. the raft client, broker lifecycle > > > manager, and even the formatter itself) only require this type to be a > > > String. Since not all Strings are valid UUIDs, making this record field > > of > > > type UUID might be too restrictive and complicate upgrading the MV for > > > existing clusters, since they might have a non-UUID cluster id string, > > but > > > need to write this record when upgrading to an MV that supports this > > > feature. Let me know what you think. > > > > > > RE JR2: Any controller node formatted with `--standalone, > > > --initial-controllers` or who is part of the static voter set defined > by > > > `controller.quorum.voters` can write the ClusterIdRecord by including > the > > > `--cluster-id` argument to `kafka-storage format`. However, if the MV > of > > > the cluster supports it, there is exactly one writer of this record to > > the > > > cluster metadata partition. The writer is the first active controller, > > who > > > writes this record alongside other bootstrap metadata records (e.g. > > > metadata version) during controller activation. At this point, we > already > > > depend on MV existing, since the active controller writes these > bootstrap > > > metadata records as a transaction if the MV supports it. I think > writing > > > the cluster id record would follow a similar pattern. > > > > > > RE JR3: When a node formats, it will write the meta.properties file. > > During > > > formatting, a node must resolve the MV it wants to format with, which > is > > > explained more in RE JR5. I need to think about this more, but I think > we > > > should keep `--cluster-id` as a required flag for invoking the format > > > command. If a broker/observer controller does not format, > meta.properties > > > is written without cluster id immediately after startup (i.e. where we > > read > > > it from disk now in KafkaRaftServer). > > > > > > RE JR4: Yeah, will do. In this context, when I say observers I'm > > referring > > > to any controllers who are not part of the KRaft voter set when they > > start > > > kafka, or any brokers. I will make this explicit in the KIP. From the > > > perspective of this feature and KRaft leader election, controller nodes > > who > > > format with `--no-initial-controllers`, controller nodes who are not > part > > > of `controller.quorum.voters`, and brokers, all do not "need" to > format, > > > since they cannot become the active controller. This means they can > > resolve > > > metadata like the cluster id after discovering the leader. We have a > > > similar pattern with how controller nodes who format with > > > `--no-initial-controllers` discover the kraft version of the cluster. > > > > > > RE JR5: If a node formats, it must resolve a metadata version with > which > > to > > > format. This comes from the `--release-version/--feature` flag and > > defaults > > > to the latest production MV. Therefore, when a node formats with a > > metadata > > > version that supports this feature, it will write the ClusterIdRecord > to > > > its `0-0/bootstrap.checkpoint`. If the node formats with a metadata > > version > > > that does not support this feature, it does not write ClusterIdRecord > to > > > its `0-0/bootstrap.checkpoint`. If a node skips formatting, it is > assumed > > > that this node is part of a cluster whose MV supports this. Otherwise, > > this > > > is a misconfiguration and the node will fail to register with the > leader > > > since there is no way for it to persist cluster id to its > meta.properties > > > without formatting. > > > > > > Although I did not specify this yet on the KIP explicitly, after some > > > offline discussion I think it makes sense to enforce the following > > > invariant as part of the feature design: if the persisted metadata > > version > > > supports this feature, the ClusterId record must also be persisted. > This > > is > > > enforceable on the write-path for MV, which occurs at two points-- > during > > > formatting and during feature upgrades. There is a similar pattern with > > > kraft.version, as it gets written to disk at the same two points. > > > > > > RE JR6: The main motivation for writing cluster id to meta.properties > as > > > well is because it can act as a projection of the cluster metadata > > > partition which essentially only exposes the cluster id to readers. For > > > example, the raft layer needs to be aware of the cluster id for its own > > RPC > > > handling/validation, but raft cannot read metadata records. There are > > many > > > readers of this cluster id value during the startup of the cluster. > > > Therefore, avoiding a read of the metadata partition to discover the > > value > > > of this metadata will prevent more complications of the startup code. > > > > > > Best, > > > Kevin Wu > > > > > > > > > On Tue, Feb 17, 2026 at 7:35 PM Jun Rao via dev <[email protected]> > > > wrote: > > > > > > > Hi, Kevin, > > > > > > > > Thanks for the KIP. A few comments. > > > > > > > > JR1. ClusterIdRecord : Could you define the record format? > > > > > > > > JR2. "a new MetadataVersion that supports encoding/decoding this > > record. > > > > This means that during formatting, the bootstrap ClusterIdRecord is > > only > > > > written if the cluster is formatted with a MV that supports this > > > feature." > > > > Could you describe who writes the ClusterIdRecord? Is it the leader > > > > controller? Also, when is the record written? Do we guarantee that MV > > is > > > > available at that time? > > > > > > > > JR3. "meta.properties can be written during kafka broker/controller > > > startup > > > > if it doesn't exist already (from formatting)" > > > > Could you describe when meta.properties is written? Is MV available > at > > > that > > > > time? > > > > > > > > JR4. "Introduce a metadata record for cluster id + observers persist > > > > cluster id to meta.properties from metadata publishing pipeline" > > > > Could you clarify what observers are? Are they observer controllers > or > > > are > > > > they brokers (which are referred to as observers to the controller)? > > > > > > > > JR5. "Bootstrap controllers can add a mandatory “cluster id” record > > > during > > > > formatting" > > > > This sounds like adding a ClusterIdRecord is optional. If so, could > you > > > > describe when a record will be added and when a record will not be > > added? > > > > > > > > JR6. "However, kafka should still be able to handle the case where a > > > leader > > > > is elected without a cluster id in meta.properties , since KRaft does > > not > > > > need cluster.id in order to elect a leader. > > > > In this case, the active controller will write a cluster id > > > > record during the bootstrap metadata write." > > > > Hmm, earlier, the KIP says "Upon discovering the cluster ID for the > > first > > > > time, these nodes need to persist this to meta.properties". Why do we > > > need > > > > to introduce a separate place to write the cluster id to > > > meta.properties. > > > > > > > > Jun > > > > > > > > > > > > On Wed, Feb 11, 2026 at 10:21 AM Kevin Wu <[email protected]> > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > Manually bumping this thread after finalizing a design. > > > > > KIP link: > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1262%3A+Enable+auto-formatting+directories > > > > > > > > > > Best, > > > > > Kevin Wu > > > > > > > > > > On Tue, Jan 6, 2026 at 7:18 AM Kevin Wu <[email protected]> > > > wrote: > > > > > > > > > > > Hello all, > > > > > > > > > > > > I would like to start a discussion on KIP-1262, which proposes > > > removing > > > > > > the formatting requirement for brokers and observer controllers. > > > > > Currently, > > > > > > I am considering two high-level designs, and would appreciate > > > community > > > > > > feedback on both approaches to decide on a final design. > > > > > > > > > > > > KIP link: > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1262%3A+Enable+auto-formatting+directories > > > > > > > > > > > > Best, > > > > > > Kevin Wu > > > > > > > > > > > > > > > > > > > > >
