[ 
https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371202#comment-17371202
 ] 

l0co commented on KAFKA-2729:
-----------------------------

[~junrao] thanks for the reply. Unfortunately from preserved logs from this 
breakdown I only have this useful:
{code:java}
[2021-06-22 14:06:50,637] INFO 1/kafka0/server.log.2021-06-22-14: [Partition 
__consumer_offsets-30 broker=0] __consumer_offsets-30 starts at Leader Epoch 
117 from offset 2612283. Previous Leader Epoch was: 116 
(kafka.cluster.Partition)
[2021-06-22 14:07:04,184] INFO 1/kafka1/server.log.2021-06-22-14: [Partition 
__consumer_offsets-30 broker=1] Shrinking ISR from 1,2,0 to 1,2 
(kafka.cluster.Partition)
[2021-06-22 14:07:04,186] INFO 1/kafka1/server.log.2021-06-22-14: [Partition 
__consumer_offsets-30 broker=1] Cached zkVersion [212] not equal to that in 
zookeeper, skip updating ISR (kafka.cluster.Partition)
[2021-06-22 14:07:09,146] INFO 1/kafka1/server.log.2021-06-22-14: [Partition 
__consumer_offsets-30 broker=1] Shrinking ISR from 1,2,0 to 1,2 
(kafka.cluster.Partition)
[2021-06-22 14:07:09,147] INFO 1/kafka1/server.log.2021-06-22-14: [Partition 
__consumer_offsets-30 broker=1] Cached zkVersion [212] not equal to that in 
zookeeper, skip updating ISR (kafka.cluster.Partition)
{code}
After the zookeeper reconnection in kafka0, kafka0 becomes the leader with 
epoch 117, and then kafka1 starts to complain that cached zkVersion is not 212, 
which is a greater number. What does it mean for you? We suspect that zookeeper 
of kafka0 has been disconnected from kafka1 and kafka2 zookeepers and 
established its own separate cluster, and then after all zookeepers got back 
into one cluster, it became inconsistent. Does it make sense for you?

> Cached zkVersion not equal to that in zookeeper, broker not recovering.
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-2729
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2729
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2.1, 0.9.0.0, 0.10.0.0, 0.10.1.0, 0.11.0.0, 2.4.1
>            Reporter: Danil Serdyuchenko
>            Assignee: Onur Karaman
>            Priority: Critical
>             Fix For: 1.1.0
>
>
> After a small network wobble where zookeeper nodes couldn't reach each other, 
> we started seeing a large number of undereplicated partitions. The zookeeper 
> cluster recovered, however we continued to see a large number of 
> undereplicated partitions. Two brokers in the kafka cluster were showing this 
> in the logs:
> {code}
> [2015-10-27 11:36:00,888] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for 
> partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 
> (kafka.cluster.Partition)
> [2015-10-27 11:36:00,891] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] 
> not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> {code}
> For all of the topics on the effected brokers. Both brokers only recovered 
> after a restart. Our own investigation yielded nothing, I was hoping you 
> could shed some light on this issue. Possibly if it's related to: 
> https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using 
> 0.8.2.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to