Hi all, Mathias' FetcherRunnable error has bitten me a couple times too, and I've finally found a way to reproduce it. I can't reproduce it using the default config, but I can if I add a path to the zk connect string. Our deployments do add a path—and I'll go ahead and wager that Mathias' do, too, and LinkedIn's don't. ;)
The repro below works for me on tag kafka-0.7.0-incubating-candidate-3 from git://git.apache.org/kafka.git, and also on branch kafka-v0.6 from git://github.com/kafka-dev/kafka.git. (Both use zookeeper-3.3.3.jar.) Dan 1. Add a path to the zk connect string in config/server.properties: -zk.connect=localhost:2181 +zk.connect=localhost:2181/kafka 2. Start zk: $ bin/zookeeper-server-start.sh config/zookeeper.properties 3. Create the zk path: $ zkCli.sh -server localhost create /kafka null 4. Start a broker: $ bin/kafka-server-start.sh config/server.properties 5. Start a consumer: $ bin/kafka-console-consumer.sh --zookeeper localhost/kafka --topic one 6. Publish a message to create the topic and connect the consumer to the broker: $ date | bin/kafka-console-producer.sh --zookeeper localhost/kafka --topic one 7. Kill zk (^C), let the consumer disconnect, restart zk, and let the consumer reconnect 8. Kill the broker (^C) After you kill the broker in (8), the consumer should log the error that Mathias reported: [2011-10-18 15:57:22,264] INFO multifetch reconnect due to java.io.EOFException: Received -1 when reading from channel, socket has likely been closed. (kafka.consumer.SimpleConsumer) [2011-10-18 15:57:22,266] ERROR error in FetcherRunnable (kafka.consumer.FetcherRunnable) java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:500) at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:54) at kafka.consumer.SimpleConsumer.liftedTree2$1(SimpleConsumer.scala:130) at kafka.consumer.SimpleConsumer.multifetch(SimpleConsumer.scala:122) at kafka.consumer.FetcherRunnable.run(FetcherRunnable.scala:64) [2011-10-18 15:57:22,267] INFO stopping fetcher FetchRunnable-0 to host 192.168.0.184 (kafka.consumer.FetcherRunnable) At this point, the consumer should be unresponsive: it's lost the connection to its broker, and it won't rebalance if you restart the broker or add new consumers to its group.