[ https://issues.apache.org/jira/browse/KAFKA-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sadek updated KAFKA-3004: ------------------------- Description: While doing load testing we have noticed that the controller will fail over almost every hour with the following entry on its log: INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener) I also see an increase in minor-GC collection around the same time. 2015-12-17T15:57:38.516+0000: 8166.220: [GC2015-12-17T15:57:38.516+0000: 8166.220: [ParNew: 283592K->4176K(314560K), 0.0081650 secs] 603757K->324456K(1013632K), 5.7134120 secs] [Times: user=0.05 sys=0.00, real=5.71 secs] I've tried increasing zookeeper.connection.timeout.ms to 60000 but it doesn't seem to help. Any idea what may be causing this? Thanks! was: While doing load testing we have noticed that the controller will fail over almost every hour with the following entry on its log: INFO [SessionExpirationListener on 4], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener) I've tried increasing zookeeper.connection.timeout.ms to 60000 but it doesn't seem to help. Any idea what may be causing this? Thanks! > Controller failing over repeatadly > ---------------------------------- > > Key: KAFKA-3004 > URL: https://issues.apache.org/jira/browse/KAFKA-3004 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 0.8.2.0 > Environment: Centos 6.5 > OpenJDK 1.7.0_79 > 6 Kafka nodes > 3 ZK nodes (cluster mode) > Reporter: Sadek > Assignee: Neha Narkhede > > While doing load testing we have noticed that the controller will fail over > almost every hour with the following entry on its log: > INFO [SessionExpirationListener on 4], ZK expired; shut down all controller > components and try to re-elect > (kafka.controller.KafkaController$SessionExpirationListener) > I also see an increase in minor-GC collection around the same time. > 2015-12-17T15:57:38.516+0000: 8166.220: [GC2015-12-17T15:57:38.516+0000: > 8166.220: [ParNew: 283592K->4176K(314560K), 0.0081650 secs] > 603757K->324456K(1013632K), 5.7134120 secs] [Times: user=0.05 sys=0.00, > real=5.71 secs] > I've tried increasing zookeeper.connection.timeout.ms to 60000 but it doesn't > seem to help. > Any idea what may be causing this? > Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)