Haoze Wu created KAFKA-14882:
--------------------------------

             Summary: Uncoordinated states about topic in ZooKeeper nodes and 
Kafka brokers cause TopicExistException at client
                 Key: KAFKA-14882
                 URL: https://issues.apache.org/jira/browse/KAFKA-14882
             Project: Kafka
          Issue Type: Improvement
    Affects Versions: 2.8.0
            Reporter: Haoze Wu


We have been doing testing on Kafka-2.8.0. We found some scenarios where 
TopicExistException happens and we feel the design of the topic create process 
in Kafka may confuse the users sometimes.

When a user uses a client which sends a topic create request to a Kafka broker, 
and the following steps will happen:
 # AdminManager check topic path in zkNodes and throw TopicExistException if 
the topic exists (Kafka sends request to ZooKeeper)
 # AdminManager add topic path in zkNodes (Kafka sends request to ZooKeeper)
 # Controller’s ZookperRequestWatcher detect it and put the corresponding event 
(ZooKeeper Watcher sends message to Kafka)
 # Event kicked out of queue and get executed (Kafka broker (controller) sends 
LeaderAndIsrRequest to Kafka broker (may include itself))
 # Broker handles the request and back to step #1

A symptom we observed is that when step #4 has some delay (stuck for some 
reason) and then the client may retry (send the topic create request again), 
which triggers TopicExistException in step #1. However, The topic create 
request should occur as kind of “transaction”. It should have some atomicity 
and also be robust under concurrent topic creation.

After some inspection, we found that it is not easy for us to implement such 
feature to the Kafka given the current implementation. But we do have the 
complaint that the user client gets TopicExistException when the topic is not 
actually existing or ready.

We suggest that maybe we can at least have some utility which help users 
mitigate this issue. For example, provide a tool which help users clean the 
ZooKeeper data and make sure the consistency of the topic metadata.

We are waiting for some feedbacks from the community. We can provided some 
concrete cases and reproduction scripts and analysis of the workload if needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to