Dong Lin created KAFKA-6618:
-------------------------------
Summary: Prevent two controllers from updating znodes concurrently
Key: KAFKA-6618
URL: https://issues.apache.org/jira/browse/KAFKA-6618
Project: Kafka
Issue Type: Improvement
Reporter: Dong Lin
Assignee: Dong Lin
Kafka controller may fail to function properly (even after repeated controller
movement) due to the following sequence of events:
- User requests topic deletion
- Controller A deletes the partition znode
- Controller B becomes controller and reads the topic znode
- Controller A deletes the topic znode and remove the topic from the topic
deletion znode
- Controller B reads the partition znode and topic deletion znode
- According to controller B's context, the topic znode exists, the topic is not
listed for deletion, and some partition is not found for the given topic. Then
controller B will create topic znode with empty data (i.e. partition
assignment) and create the partition znodes.
- All controller after controller B will fail because there is not data in the
topic znode.
The long term solution is to have a way to prevent old controller from writing
to zookeeper if it is not the active controller. One idea is to use the
zookeeper multi API (See
[https://zookeeper.apache.org/doc/r3.4.3/api/org/apache/zookeeper/ZooKeeper.html#multi(java.lang.Iterable))]
such that controller only writes to zookeeper if the zk version of the
controller znode has not been changed.
The short term solution is to let controller reads the topic deletion znode
first. If the topic is still listed in the topic deletion znode, then the new
controller will properly handle partition states of this topic without creating
partition znodes for this topic. And if the topic is not listed in the topic
deletion znode, then both the topic znode and the partition znodes of this
topic should have been deleted by the time the new controller tries to read
them.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)