[jira] [Updated] (KAFKA-4096) Kafka Backup and Recovery

Karthik Reddy (JIRA) Sun, 28 Aug 2016 20:41:14 -0700

     [ 
https://issues.apache.org/jira/browse/KAFKA-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Karthik Reddy updated KAFKA-4096:
---------------------------------
    Description: 
Hi Team,

We are trying to move the data on Kafka Cluster from one region to another 
region.Region here could be a separate Data center or a separate cluster within 
the same region.

In the effort to do this, we have stopped the ZK/Kafka of the old Cluster, 
detached the EBS volumes where kafka stores all topics related data and then 
attached the EBS volumes to the new cluster.

We observed that new ZK cluster came with all the data that previous ZK 
persisted meaning all the topic metadata and consumer offset information. 
However, on the Kafka side, we noticed that messages are not seen, all the 
index and log files are of empty size.

The recovery point and recovery offset checkpoint indicate the correct base 
offset as present in the old cluster.

Apart from the MirrorMaker strategy to move the data from all the topics, can 
you let us know is there any specific process to copy the file system snapshots 
from one region to other.

We did restart of Kafka/ZK but that didn't help.


Thanks,
Karthik

  was:
Hi Team,

We have seen the below messages in the Kafka logs, indicating there was a 
timeout on ZK.

Could you please advise us on how to tune or better optimize the Kafka-ZK 
communication.

Kafka and ZK are on separate servers.Currently, we have the ZK timeout set to 
6000 ms.
Kafka servers have EBS volumes as the disk.

We had to restart our consumers and ZK to resolve this issue.

[2016-03-10 02:29:25,858] INFO Unable to read additional data from server 
sessionid 0x5531d0003f30030, likely server has closed socket, closing socket 
connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2016-03-10 02:29:25,958] INFO zookeeper state changed (Disconnected) 
(org.I0Itec.zkclient.ZkClient)
[2016-03-10 02:29:26,381] INFO Opening socket connection to server 
10.200.77.74/10.200.77.74:8164. Will not attempt to authenticate using SASL 
(unknown error) (org.apache.zookeeper.ClientCnxn)
[2016-03-10 02:29:26,382] INFO Socket connection established to 
10.200.77.74/10.200.77.74:8164, initiating session 
(org.apache.zookeeper.ClientCnxn)
[2016-03-10 02:29:26,385] INFO Session establishment complete on server 
10.200.77.74/10.200.77.74:8164, sessionid = 0x5531d0003f30030, negotiated 
timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2016-03-10 02:29:26,385] INFO zookeeper state changed (SyncConnected) 
(org.I0Itec.zkclient.ZkClient)
[2016-03-10 02:29:30,961] INFO conflict in /controller data: 
{"version":1,"brokerid":3,"timestamp":"1457594970952"} stored data: 
{"version":1,"brokerid":5,"timestamp":"1457594970043"} (kafka.utils.ZkUtils$)
[2016-03-10 02:29:30,969] INFO New leader is 5 
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2016-03-10 02:29:31,620] INFO [ReplicaFetcherManager on broker 3] Removed 
fetcher for partitions 
[__consumer_offsets,0],[fulfillment.payments.autopay.mongooperation.response,1],[__consumer_offsets,20],[__consumer_offsets,40]
 (kafka.server.ReplicaFetcherManager)
[2016-03-10 02:29:31,621] INFO [ReplicaFetcherManager on broker 3] Removed 
fetcher for partitions 
[efit.framework.notification.error,1],[__consumer_offsets,15],[fulfillment.payments.autopay.processexception.notification,1],[__consumer_offsets,35]
 (kafka.server.ReplicaFetcherManager)
[2016-03-10 02:29:31,621] INFO Truncating log 
efit.framework.notification.error-1 to offset 637. (kafka.log.Log)
[2016-03-10 02:29:31,621] INFO Truncating log __consumer_offsets-15 to offset 
0. (kafka.log.Log)
[2016-03-10 02:29:31,622] INFO Truncating log 
fulfillment.payments.autopay.processexception.notification-1 to offset 0. 
(kafka.log.Log)
[2016-03-10 02:29:31,622] INFO Truncating log __consumer_offsets-35 to offset 
0. (kafka.log.Log)
[2016-03-10 02:29:31,623] INFO Loading offsets from [__consumer_offsets,0] 
(kafka.server.OffsetManager)
[2016-03-10 02:29:31,624] INFO Loading offsets from [__consumer_offsets,20] 
(kafka.server.OffsetManager)
[2016-03-10 02:29:31,624] INFO Finished loading offsets from 
[__consumer_offsets,0] in 1 milliseconds. (kafka.server.OffsetManager)
[2016-03-10 02:29:31,625] INFO Loading offsets from [__consumer_offsets,40] 
(kafka.server.OffsetManager)
[2016-03-10 02:29:31,625] INFO Finished loading offsets from 
[__consumer_offsets,20] in 1 milliseconds. (kafka.server.OffsetManager)
[2016-03-10 02:29:31,625] INFO Finished loading offsets from 
[__consumer_offsets,40] in 0 milliseconds. (kafka.server.OffsetManager)
[2016-03-10 02:29:31,627] INFO [ReplicaFetcherManager on broker 3] Added 
fetcher for partitions List([[efit.framework.notification.error,1], initOffset 
637 to broker id:1,host:10.200.77.78,port:8165] , [[__consumer_offsets,15], 
initOffset 0 to broker id:1,host:10.200.77.78,port:8165] , 
[[fulfillment.payments.autopay.processexception.notification,1], initOffset 0 
to broker id:5,host:10.200.75.150,port:8165] , [[__consumer_offsets,35], 
initOffset 0 to broker id:1,host:10.200.77.78,port:8165] ) 
(kafka.server.ReplicaFetcherManager)
[2016-03-10 02:29:31,627] INFO [ReplicaFetcherThread-0-2], Shutting down 
(kafka.server.ReplicaFetcherThread

Thanks,
Karthik


> Kafka Backup and Recovery
> -------------------------
>
>                 Key: KAFKA-4096
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4096
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2.0
>         Environment: RHEL 7.2, AWS EC2 compute instance
>            Reporter: Karthik Reddy
>            Assignee: Neha Narkhede
>            Priority: Critical
>
> Hi Team,
> We are trying to move the data on Kafka Cluster from one region to another 
> region.Region here could be a separate Data center or a separate cluster 
> within the same region.
> In the effort to do this, we have stopped the ZK/Kafka of the old Cluster, 
> detached the EBS volumes where kafka stores all topics related data and then 
> attached the EBS volumes to the new cluster.
> We observed that new ZK cluster came with all the data that previous ZK 
> persisted meaning all the topic metadata and consumer offset information. 
> However, on the Kafka side, we noticed that messages are not seen, all the 
> index and log files are of empty size.
> The recovery point and recovery offset checkpoint indicate the correct base 
> offset as present in the old cluster.
> Apart from the MirrorMaker strategy to move the data from all the topics, can 
> you let us know is there any specific process to copy the file system 
> snapshots from one region to other.
> We did restart of Kafka/ZK but that didn't help.
> Thanks,
> Karthik



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-4096) Kafka Backup and Recovery

Reply via email to