gaurav-narula commented on code in PR #22384:
URL: https://github.com/apache/kafka/pull/22384#discussion_r3380173308


##########
server/src/test/java/org/apache/kafka/server/ReconfigurableQuorumIntegrationTest.java:
##########
@@ -255,7 +255,7 @@ public void testNewVoterAutoRemovesAndAdds() throws 
Exception {
     @Test
     public void testRemoveAndAddVoterWithValidClusterId() throws Exception {
         final var nodes = new TestKitNodes.Builder()
-            .setClusterId("test-cluster")
+            .setClusterId("5EqhrOPYSkaSsXk4RYkNow")

Review Comment:
   I revisited 
[KIP-78](https://cwiki.apache.org/confluence/display/KAFKA/KIP-78%3A+Cluster+Id)
 where this was introduced and it seems that the authors never **_mandated_** 
the use of UUIDs:
   
   > We update KafkaServer.startup() to invoke getOrGenerateClusterId() after 
initZk().
   The implementation of getOrGenerateClusterId() is straightforward. It first 
tries to get the cluster id from /cluster/id. If the znode does not exist, it 
generates an id via UUID.randomUUID(), converts it to a String via URL-safe 
Base64 encoding and tries to create the znode. If the creation succeeds, the 
generated id is returned. If it fails with a ZkNodeExistsException, then 
another broker won the race, so we can just retrieve the cluster id from the 
expected znode path and return it. The returned cluster id will be passed to 
registered metric reporters who implement ClusterResourceListener and it will 
be logged at info level.
   
   IIUC, that leads to two situations:
   
   1. Operators had the brokers create the ZNode themselves, in which case it 
will be a base64 url encoded uuid
   2. Operators pre-created the ZNode themselves, in which case it could be 
populated with anything and the brokers [do not 
enforce](https://github.com/apache/kafka/pull/1830/changes#diff-3638ff970bc6766f9e570f76a6440966930b990af33e53e5b3db9d65f90264e9R203)
 any validation on the value
   
   (2) is unfortunate because the KIP talks about restricting the set of valid 
characters
   
   > We propose adding the cluster_id to the Metadata response, storing it in 
ZooKeeper, exposing it via metrics, to serializers and client interceptors. The 
cluster_id should be unique,  immutable, auto-generated and it may be used in 
situations where the allowed character set is restricted. Given this, we limit 
the allowed characters to the following regular expression [a-zA-Z0-9_\-]+, 
which corresponds to the characters used by the URL-safe Base64 variant.
   
   With KRaft, `kafka-storage.sh` doubles down by not performing any validation 
while formatting. Rethinking about this, the validation in this PR does break 
compatibility and should
   
   (a) Either be reversed or
   (b) Provide an override that disables validation check for users looking to 
reformat existing clusters
   
   I also feel `SharedServer#clusterId` can do with some documentation as to 
why its type is a String and it's unsafe to parse it as a UUID and will help 
reviewers and contributors alike.
   
   WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to