[ https://issues.apache.org/jira/browse/KAFKA-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karthik Reddy updated KAFKA-4096: --------------------------------- Component/s: (was: zkclient) (was: consumer) > Kafka Backup and Recovery > ------------------------- > > Key: KAFKA-4096 > URL: https://issues.apache.org/jira/browse/KAFKA-4096 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8.2.0 > Environment: RHEL 7.2, AWS EC2 compute instance > Reporter: Karthik Reddy > Assignee: Neha Narkhede > Priority: Critical > > Hi Team, > We have seen the below messages in the Kafka logs, indicating there was a > timeout on ZK. > Could you please advise us on how to tune or better optimize the Kafka-ZK > communication. > Kafka and ZK are on separate servers.Currently, we have the ZK timeout set to > 6000 ms. > Kafka servers have EBS volumes as the disk. > We had to restart our consumers and ZK to resolve this issue. > [2016-03-10 02:29:25,858] INFO Unable to read additional data from server > sessionid 0x5531d0003f30030, likely server has closed socket, closing socket > connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) > [2016-03-10 02:29:25,958] INFO zookeeper state changed (Disconnected) > (org.I0Itec.zkclient.ZkClient) > [2016-03-10 02:29:26,381] INFO Opening socket connection to server > 10.200.77.74/10.200.77.74:8164. Will not attempt to authenticate using SASL > (unknown error) (org.apache.zookeeper.ClientCnxn) > [2016-03-10 02:29:26,382] INFO Socket connection established to > 10.200.77.74/10.200.77.74:8164, initiating session > (org.apache.zookeeper.ClientCnxn) > [2016-03-10 02:29:26,385] INFO Session establishment complete on server > 10.200.77.74/10.200.77.74:8164, sessionid = 0x5531d0003f30030, negotiated > timeout = 6000 (org.apache.zookeeper.ClientCnxn) > [2016-03-10 02:29:26,385] INFO zookeeper state changed (SyncConnected) > (org.I0Itec.zkclient.ZkClient) > [2016-03-10 02:29:30,961] INFO conflict in /controller data: > {"version":1,"brokerid":3,"timestamp":"1457594970952"} stored data: > {"version":1,"brokerid":5,"timestamp":"1457594970043"} (kafka.utils.ZkUtils$) > [2016-03-10 02:29:30,969] INFO New leader is 5 > (kafka.server.ZookeeperLeaderElector$LeaderChangeListener) > [2016-03-10 02:29:31,620] INFO [ReplicaFetcherManager on broker 3] Removed > fetcher for partitions > [__consumer_offsets,0],[fulfillment.payments.autopay.mongooperation.response,1],[__consumer_offsets,20],[__consumer_offsets,40] > (kafka.server.ReplicaFetcherManager) > [2016-03-10 02:29:31,621] INFO [ReplicaFetcherManager on broker 3] Removed > fetcher for partitions > [efit.framework.notification.error,1],[__consumer_offsets,15],[fulfillment.payments.autopay.processexception.notification,1],[__consumer_offsets,35] > (kafka.server.ReplicaFetcherManager) > [2016-03-10 02:29:31,621] INFO Truncating log > efit.framework.notification.error-1 to offset 637. (kafka.log.Log) > [2016-03-10 02:29:31,621] INFO Truncating log __consumer_offsets-15 to offset > 0. (kafka.log.Log) > [2016-03-10 02:29:31,622] INFO Truncating log > fulfillment.payments.autopay.processexception.notification-1 to offset 0. > (kafka.log.Log) > [2016-03-10 02:29:31,622] INFO Truncating log __consumer_offsets-35 to offset > 0. (kafka.log.Log) > [2016-03-10 02:29:31,623] INFO Loading offsets from [__consumer_offsets,0] > (kafka.server.OffsetManager) > [2016-03-10 02:29:31,624] INFO Loading offsets from [__consumer_offsets,20] > (kafka.server.OffsetManager) > [2016-03-10 02:29:31,624] INFO Finished loading offsets from > [__consumer_offsets,0] in 1 milliseconds. (kafka.server.OffsetManager) > [2016-03-10 02:29:31,625] INFO Loading offsets from [__consumer_offsets,40] > (kafka.server.OffsetManager) > [2016-03-10 02:29:31,625] INFO Finished loading offsets from > [__consumer_offsets,20] in 1 milliseconds. (kafka.server.OffsetManager) > [2016-03-10 02:29:31,625] INFO Finished loading offsets from > [__consumer_offsets,40] in 0 milliseconds. (kafka.server.OffsetManager) > [2016-03-10 02:29:31,627] INFO [ReplicaFetcherManager on broker 3] Added > fetcher for partitions List([[efit.framework.notification.error,1], > initOffset 637 to broker id:1,host:10.200.77.78,port:8165] , > [[__consumer_offsets,15], initOffset 0 to broker > id:1,host:10.200.77.78,port:8165] , > [[fulfillment.payments.autopay.processexception.notification,1], initOffset 0 > to broker id:5,host:10.200.75.150,port:8165] , [[__consumer_offsets,35], > initOffset 0 to broker id:1,host:10.200.77.78,port:8165] ) > (kafka.server.ReplicaFetcherManager) > [2016-03-10 02:29:31,627] INFO [ReplicaFetcherThread-0-2], Shutting down > (kafka.server.ReplicaFetcherThread > Thanks, > Karthik -- This message was sent by Atlassian JIRA (v6.3.4#6332)