Fedor Korotkiy created KAFKA-1310:
-------------------------------------
Summary: Zookeeper timeout causes deadlock in Controller
Key: KAFKA-1310
URL: https://issues.apache.org/jira/browse/KAFKA-1310
Project: Kafka
Issue Type: Bug
Reporter: Fedor Korotkiy
Steps to reproduce:
1. Checkout and build 0.8.1 branch from github:
git clone [email protected]:apache/kafka.git && cd kafka && git checkout
origin/0.8.1 && ./gradlew jar
2. Start zookeeper server:
./bin/zookeeper-server-start.sh config/zookeeper.properties
3. Start kafka server:
./bin/kafka-server-start.sh config/server.properties
4. Suspend zookeeper process for 10 seconds (ctrl-Z, then %1).
5. And kafka hasn't been re-registered in zookeeper.
./bin/zookeeper-shell.sh
ls /brokers/ids
>> []
Root cause of the problem seems to be the deadlock between DeleteTopicsThread
and SessionExpirationListener in KafkaController.
1. DeleteTopicsThread acquires controllerLock and await()-s on deleteTopicsCond
in awaitTopicDeletionNotification()
2. SessionExpirationListener fires. It acquires controllerLock and tries to
shutdown deleteTopicManager(in onControllerResignation()). This interrupts
DeleteTopicsThread.
3. DeleteTopicsThread can't return from deleteTopicsCond.await() because
controllerLock is taken. We got a deadlock.
--
This message was sent by Atlassian JIRA
(v6.2#6252)