[ 
https://issues.apache.org/jira/browse/KAFKA-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942939#comment-13942939
 ] 

Michael Noll commented on KAFKA-1310:
-------------------------------------

I can confirm this issue, using Kafka 0.8.1.

Here are the error messages when trying to create a topic:

{code}
$ bin/kafka-topics.sh --create --zookeeper zookeeper1:2181 --topic testing 
--partitions 1 --replication-factor 1
Error while executing topic command 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /brokers/ids
org.I0Itec.zkclient.exception.ZkNoNodeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /brokers/ids
        at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
        at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:413)
        at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409)
        at kafka.utils.ZkUtils$.getChildren(ZkUtils.scala:480)
        at kafka.utils.ZkUtils$.getSortedBrokerList(ZkUtils.scala:81)
        at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:154)
        at kafka.admin.TopicCommand$.createTopic(TopicCommand.scala:88)
        at kafka.admin.TopicCommand$.main(TopicCommand.scala:50)
        at kafka.admin.TopicCommand.main(TopicCommand.scala)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /brokers/ids
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1277)
        at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:99)
        at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:416)
        at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:413)
        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
        ... 8 more
{code}

If you use the ZK CLI you will sometimes see a znode under {{/brokers/ids}}, 
sometimes not.  In my limited testing I could, for instance, create a topic 
(partitions=1, replicas=1) and then list/describe it.  But at least when I 
reached the point to try sending messages to it, it would fail.  See next 
example.

When trying to use the console producer to sent a test message "foo" (String) 
to the topic/broker:

{code}
$ bin/kafka-console-producer.sh --topic testing --broker-list localhost:9092
foo   <<< This is the test message, manually entered in the console/terminal
[2014-03-20 09:45:32,223] WARN Error while fetching metadata [{TopicMetadata 
for topic testing ->
No partition metadata for topic testing due to 
kafka.common.LeaderNotAvailableException}] for topic [testing]: class 
kafka.common.LeaderNotAvailableException  (kafka.producer.BrokerPartitionInfo)
[2014-03-20 09:45:32,233] WARN Error while fetching metadata [{TopicMetadata 
for topic testing ->
No partition metadata for topic testing due to 
kafka.common.LeaderNotAvailableException}] for topic [testing]: class 
kafka.common.LeaderNotAvailableException  (kafka.producer.BrokerPartitionInfo)
[2014-03-20 09:45:32,234] ERROR Failed to collate messages by topic, partition 
due to: Failed to fetch topic metadata for topic: testing 
(kafka.producer.async.DefaultEventHandler)
{code}

*How to reproduce*

Using Wirbelsturm you can reproduce this error as follow.  This assumes you 
have Vagrant 1.4.x and VirtualBox already installed on your host machine.

{code}
$ git clone https://github.com/miguno/wirbelsturm.git
$ cd wirbelsturm
$ ./bootstrap     # <<< May take a while depending on how fast your Internet 
connection is.

# Then uncomment the `kafka_broker` section in `wirbelsturm.yaml`.
# Only remove the leading `#` character in each line -- the remaining leading 
whitespace is significant.
$ vagrant up zookeeper1 kafka1     # <<< May take a while (boots VMs, downloads 
RPMs from the Internet to provision the VMs, etc.)
{code}

Now you can ssh into the VM {{kafka1}} via {{vagrant ssh kafka1}} and run the 
commands above from within the {{/opt/kafka}} directory.

> Zookeeper timeout causes deadlock in Controller
> -----------------------------------------------
>
>                 Key: KAFKA-1310
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1310
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Fedor Korotkiy
>            Assignee: Neha Narkhede
>            Priority: Blocker
>
> Steps to reproduce:
> 1. Checkout and build 0.8.1 branch from github:
> git clone g...@github.com:apache/kafka.git && cd kafka && git checkout 
> origin/0.8.1 && ./gradlew jar
> 2. Start zookeeper server:
> ./bin/zookeeper-server-start.sh config/zookeeper.properties
> 3. Start kafka server:
> ./bin/kafka-server-start.sh config/server.properties
> 4. Suspend zookeeper process for 10 seconds (ctrl-Z, then %1).
> 5. And kafka hasn't been re-registered in zookeeper.
> ./bin/zookeeper-shell.sh
> ls /brokers/ids
> >> []
> Root cause of the problem seems to be the deadlock between DeleteTopicsThread 
> and SessionExpirationListener in KafkaController.
> 1. DeleteTopicsThread acquires controllerLock and await()-s on 
> deleteTopicsCond in awaitTopicDeletionNotification()
> 2. SessionExpirationListener fires. It acquires controllerLock and tries to 
> shutdown deleteTopicManager(in onControllerResignation()). This interrupts 
> DeleteTopicsThread.
> 3. DeleteTopicsThread can't return from deleteTopicsCond.await() because 
> controllerLock is taken. We got a deadlock.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to