gaurav-narula commented on code in PR #22384:
URL: https://github.com/apache/kafka/pull/22384#discussion_r3380173308
##########
server/src/test/java/org/apache/kafka/server/ReconfigurableQuorumIntegrationTest.java:
##########
@@ -255,7 +255,7 @@ public void testNewVoterAutoRemovesAndAdds() throws
Exception {
@Test
public void testRemoveAndAddVoterWithValidClusterId() throws Exception {
final var nodes = new TestKitNodes.Builder()
- .setClusterId("test-cluster")
+ .setClusterId("5EqhrOPYSkaSsXk4RYkNow")
Review Comment:
I revisited
[KIP-78](https://cwiki.apache.org/confluence/display/KAFKA/KIP-78%3A+Cluster+Id)
where this was introduced and it seems that the authors never **_mandated_**
the use of UUIDs:
> We update KafkaServer.startup() to invoke getOrGenerateClusterId() after
initZk().
The implementation of getOrGenerateClusterId() is straightforward. It first
tries to get the cluster id from /cluster/id. If the znode does not exist, it
generates an id via UUID.randomUUID(), converts it to a String via URL-safe
Base64 encoding and tries to create the znode. If the creation succeeds, the
generated id is returned. If it fails with a ZkNodeExistsException, then
another broker won the race, so we can just retrieve the cluster id from the
expected znode path and return it. The returned cluster id will be passed to
registered metric reporters who implement ClusterResourceListener and it will
be logged at info level.
IIUC, that leads to two situations:
1. Operators had the brokers create the ZNode themselves, in which case it
will be a base64 url encoded uuid
2. Operators pre-created the ZNode themselves, in which case it could be
populated with anything and the brokers [do not
enforce](https://github.com/apache/kafka/pull/1830/changes#diff-3638ff970bc6766f9e570f76a6440966930b990af33e53e5b3db9d65f90264e9R203)
any validation on the value
(2) is unfortunate because the KIP talks about restricting the set of valid
characters
> We propose adding the cluster_id to the Metadata response, storing it in
ZooKeeper, exposing it via metrics, to serializers and client interceptors. The
cluster_id should be unique, immutable, auto-generated and it may be used in
situations where the allowed character set is restricted. Given this, we limit
the allowed characters to the following regular expression [a-zA-Z0-9_\-]+,
which corresponds to the characters used by the URL-safe Base64 variant.
With KRaft, `kafka-storage.sh` doubles down by not performing any validation
while formatting. Rethinking about this, the validation in this PR does break
compatibility and should
(a) Either be reversed or
(b) Provide an override that disables validation check for users looking to
reformat existing clusters
I also feel `SharedServer#clusterId` can do with some documentation as to
why its type is a String and it's unsafe to parse it as a UUID and will help
reviewers and contributors alike.
WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]