[ https://issues.apache.org/jira/browse/KAFKA-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karthik Reddy updated KAFKA-4096: --------------------------------- Description: Hi Team, We are trying to move the data on Kafka Cluster from one region to another region.Region here could be a separate Data center or a separate cluster within the same region. In the effort to do this, we have stopped the ZK/Kafka of the old Cluster, detached the EBS volumes where kafka stores all topics related data and then attached the EBS volumes to the new cluster. We observed that new ZK cluster came with all the data that previous ZK persisted meaning all the topic metadata and consumer offset information. However, on the Kafka side, we noticed that messages are not seen, all the index and log files are of empty size. The recovery point and recovery offset checkpoint indicate the correct base offset as present in the old cluster. Apart from the MirrorMaker strategy to move the data from all the topics, can you let us know is there any specific process to copy the file system snapshots from one region to other. We did restart of Kafka/ZK but that didn't help. Thanks, Karthik was: Hi Team, We have seen the below messages in the Kafka logs, indicating there was a timeout on ZK. Could you please advise us on how to tune or better optimize the Kafka-ZK communication. Kafka and ZK are on separate servers.Currently, we have the ZK timeout set to 6000 ms. Kafka servers have EBS volumes as the disk. We had to restart our consumers and ZK to resolve this issue. [2016-03-10 02:29:25,858] INFO Unable to read additional data from server sessionid 0x5531d0003f30030, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2016-03-10 02:29:25,958] INFO zookeeper state changed (Disconnected) (org.I0Itec.zkclient.ZkClient) [2016-03-10 02:29:26,381] INFO Opening socket connection to server 10.200.77.74/10.200.77.74:8164. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2016-03-10 02:29:26,382] INFO Socket connection established to 10.200.77.74/10.200.77.74:8164, initiating session (org.apache.zookeeper.ClientCnxn) [2016-03-10 02:29:26,385] INFO Session establishment complete on server 10.200.77.74/10.200.77.74:8164, sessionid = 0x5531d0003f30030, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn) [2016-03-10 02:29:26,385] INFO zookeeper state changed (SyncConnected) (org.I0Itec.zkclient.ZkClient) [2016-03-10 02:29:30,961] INFO conflict in /controller data: {"version":1,"brokerid":3,"timestamp":"1457594970952"} stored data: {"version":1,"brokerid":5,"timestamp":"1457594970043"} (kafka.utils.ZkUtils$) [2016-03-10 02:29:30,969] INFO New leader is 5 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener) [2016-03-10 02:29:31,620] INFO [ReplicaFetcherManager on broker 3] Removed fetcher for partitions [__consumer_offsets,0],[fulfillment.payments.autopay.mongooperation.response,1],[__consumer_offsets,20],[__consumer_offsets,40] (kafka.server.ReplicaFetcherManager) [2016-03-10 02:29:31,621] INFO [ReplicaFetcherManager on broker 3] Removed fetcher for partitions [efit.framework.notification.error,1],[__consumer_offsets,15],[fulfillment.payments.autopay.processexception.notification,1],[__consumer_offsets,35] (kafka.server.ReplicaFetcherManager) [2016-03-10 02:29:31,621] INFO Truncating log efit.framework.notification.error-1 to offset 637. (kafka.log.Log) [2016-03-10 02:29:31,621] INFO Truncating log __consumer_offsets-15 to offset 0. (kafka.log.Log) [2016-03-10 02:29:31,622] INFO Truncating log fulfillment.payments.autopay.processexception.notification-1 to offset 0. (kafka.log.Log) [2016-03-10 02:29:31,622] INFO Truncating log __consumer_offsets-35 to offset 0. (kafka.log.Log) [2016-03-10 02:29:31,623] INFO Loading offsets from [__consumer_offsets,0] (kafka.server.OffsetManager) [2016-03-10 02:29:31,624] INFO Loading offsets from [__consumer_offsets,20] (kafka.server.OffsetManager) [2016-03-10 02:29:31,624] INFO Finished loading offsets from [__consumer_offsets,0] in 1 milliseconds. (kafka.server.OffsetManager) [2016-03-10 02:29:31,625] INFO Loading offsets from [__consumer_offsets,40] (kafka.server.OffsetManager) [2016-03-10 02:29:31,625] INFO Finished loading offsets from [__consumer_offsets,20] in 1 milliseconds. (kafka.server.OffsetManager) [2016-03-10 02:29:31,625] INFO Finished loading offsets from [__consumer_offsets,40] in 0 milliseconds. (kafka.server.OffsetManager) [2016-03-10 02:29:31,627] INFO [ReplicaFetcherManager on broker 3] Added fetcher for partitions List([[efit.framework.notification.error,1], initOffset 637 to broker id:1,host:10.200.77.78,port:8165] , [[__consumer_offsets,15], initOffset 0 to broker id:1,host:10.200.77.78,port:8165] , [[fulfillment.payments.autopay.processexception.notification,1], initOffset 0 to broker id:5,host:10.200.75.150,port:8165] , [[__consumer_offsets,35], initOffset 0 to broker id:1,host:10.200.77.78,port:8165] ) (kafka.server.ReplicaFetcherManager) [2016-03-10 02:29:31,627] INFO [ReplicaFetcherThread-0-2], Shutting down (kafka.server.ReplicaFetcherThread Thanks, Karthik > Kafka Backup and Recovery > ------------------------- > > Key: KAFKA-4096 > URL: https://issues.apache.org/jira/browse/KAFKA-4096 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8.2.0 > Environment: RHEL 7.2, AWS EC2 compute instance > Reporter: Karthik Reddy > Assignee: Neha Narkhede > Priority: Critical > > Hi Team, > We are trying to move the data on Kafka Cluster from one region to another > region.Region here could be a separate Data center or a separate cluster > within the same region. > In the effort to do this, we have stopped the ZK/Kafka of the old Cluster, > detached the EBS volumes where kafka stores all topics related data and then > attached the EBS volumes to the new cluster. > We observed that new ZK cluster came with all the data that previous ZK > persisted meaning all the topic metadata and consumer offset information. > However, on the Kafka side, we noticed that messages are not seen, all the > index and log files are of empty size. > The recovery point and recovery offset checkpoint indicate the correct base > offset as present in the old cluster. > Apart from the MirrorMaker strategy to move the data from all the topics, can > you let us know is there any specific process to copy the file system > snapshots from one region to other. > We did restart of Kafka/ZK but that didn't help. > Thanks, > Karthik -- This message was sent by Atlassian JIRA (v6.3.4#6332)