Dong Lin created KAFKA-6618:

             Summary: Prevent two controllers from updating znodes concurrently
                 Key: KAFKA-6618
             Project: Kafka
          Issue Type: Improvement
            Reporter: Dong Lin
            Assignee: Dong Lin

Kafka controller may fail to function properly (even after repeated controller 
movement) due to the following sequence of events:

- User requests topic deletion
- Controller A deletes the partition znode
- Controller B becomes controller and reads the topic znode
- Controller A deletes the topic znode and remove the topic from the topic 
deletion znode
- Controller B reads the partition znode and topic deletion znode
- According to controller B's context, the topic znode exists, the topic is not 
listed for deletion, and some partition is not found for the given topic. Then 
controller B will create topic znode with empty data (i.e. partition 
assignment) and create the partition znodes.
- All controller after controller B will fail because there is not data in the 
topic znode.

The long term solution is to have a way to prevent old controller from writing 
to zookeeper if it is not the active controller. One idea is to use the 
zookeeper multi API (See 
 such that controller only writes to zookeeper if the zk version of the 
controller znode has not been changed.

The short term solution is to let controller reads the topic deletion znode 
first. If the topic is still listed in the topic deletion znode, then the new 
controller will properly handle partition states of this topic without creating 
partition znodes for this topic. And if the topic is not listed in the topic 
deletion znode, then both the topic znode and the partition znodes of this 
topic should have been deleted by the time the new controller tries to read 

This message was sent by Atlassian JIRA

Reply via email to