Matt Jones created KAFKA-462: -------------------------------- Summary: ZK thread crashing doesn't bring down the broker (and doesn't come back up). Key: KAFKA-462 URL: https://issues.apache.org/jira/browse/KAFKA-462 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.7 Reporter: Matt Jones
I think the simplest explanation is the traceback. The broker had been up starting at 2012-07-31 18:45:42,951 (based upon the 'Starting Kafka server' log entry), and the error was fixed with a restart of the broker at 2012-08-14 20:59:41,581. It looks like zookeeper thread crashed, but the broker kept operating as usual. The expected behavior would be that the zookeeper thread crashing would cause the whole broker to crash, or the zookeeper thread would start itself back up. [2012-08-08 01:25:13,398] 624270894 [main-SendThread(zookeeper001:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 8749ms for sessionid 0x138e4edc04c1e50, closing socket connection and attempting reconnect [2012-08-08 01:25:15,136] 624272632 [main-EventThread] INFO org.I0Itec.zkclient.ZkClient - zookeeper state changed (Disconnected) [2012-08-08 01:25:15,702] 624273198 [main-SendThread(zookeeper001:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server zookeeper003/10.125.95.193:2181 [2012-08-08 01:25:15,704] 624273200 [main-SendThread(zookeeper003:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to zookeeper003/10.125.95.193:2181, initiating session [2012-08-08 01:25:15,709] 624273205 [main-EventThread] INFO org.I0Itec.zkclient.ZkClient - zookeeper state changed (Expired) [2012-08-08 01:25:15,709] 624273205 [main-EventThread] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=zookeeper001:2181,zookeeper002:2181,zookeeper003:2181 sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@26d66426 [2012-08-08 01:25:21,514] 624279010 [main-SendThread(zookeeper003:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x138e4edc04c1e50 has expired, closing socket connection [2012-08-08 01:25:47,135] 624304631 [main-EventThread] ERROR org.apache.zookeeper.ClientCnxn - Error while calling watcher at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) Caused by: org.I0Itec.zkclient.exception.ZkException: Unable to connect to zookeeper001:2181,zookeeper002:2181,zookeeper003:2181 Caused by: java.net.UnknownHostException: zookeeper001 at org.apache.zookeeper.ClientCnxn.<init>(ClientCnxn.java:386) at org.apache.zookeeper.ClientCnxn.<init>(ClientCnxn.java:331) at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:377) [2012-08-08 01:25:48,620] 624306116 [main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira