massakam opened a new issue #2289: Broker suddenly goes down URL: https://github.com/apache/incubator-pulsar/issues/2289 Recently, broker goes down occasionally in our some clusters. The following is an excerpt from log of the broker that went down. ``` 13:30:11.464 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 25 seconds 13:30:13.464 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 23 seconds 13:30:15.464 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 21 seconds 13:30:17.464 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 19 seconds 13:30:19.465 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 17 seconds 13:30:21.465 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 15 seconds 13:30:23.465 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 13 seconds 13:30:25.465 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 11 seconds 13:30:27.465 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 8 seconds 13:30:29.465 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 6 seconds 13:30:31.465 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 4 seconds 13:30:33.466 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 2 seconds 13:30:35.466 [pulsar-zk-session-watcher-12-1] WARN o.a.p.z.ZooKeeperSessionWatcher - zoo keeper disconnected, waiting to reconnect, time remaining = 0 seconds 13:30:37.466 [pulsar-zk-session-watcher-12-1] ERROR o.a.p.z.ZooKeeperSessionWatcher - timeout expired for reconnecting, invoking shutdown service 13:30:37.467 [pulsar-zk-session-watcher-12-1] INFO org.apache.zookeeper.ZooKeeper - Session: 0x164f333639f0269 closed 13:30:37.467 [pulsar-zk-session-watcher-12-1] INFO o.a.p.b.MessagingServiceShutdownHook - Invoking Runtime.halt(-1) ``` The broker service was shutdown since it could not reconnect to ZK for a long time. However, all ZK servers seemed to be working normally at that time. Does someone know this cause? #### System configuration - **Cluster-A** - **Pulsar version**: 1.22.1-incubating - **ZK version**: 3.4.10 - **Cluster-B** - **Pulsar version**: 2.0.1-incubating - **ZK version**: 3.4.10 ZK settings: ``` tickTime=2000 initLimit=10 syncLimit=5 dataDir=/usr/local/var/pulsar-zookeeper clientPort=2181 maxClientCnxns=0 autopurge.snapRetainCount=3 autopurge.purgeInterval=1 server.1=xxxx:2182:2183 server.2=xxxx:2182:2183 server.3=xxxx:2182:2183 server.4=xxxx:2182:2183 server.5=xxxx:2182:2183 ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
