[jira] [Commented] (KAFKA-2729) Cached zkVersion not equal to that in zookeeper, broker not recovering.

Sam Nguyen (JIRA) Wed, 29 Mar 2017 10:00:06 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947500#comment-15947500
 ]


Sam Nguyen commented on KAFKA-2729:
-----------------------------------

We ran into this today on kafka_2.11-0.10.0.1.

There is unexpected behavior with regards to partition availability.  One out 
of 3 total brokers in our cluster entered this state (emitting "Cached 
zkVersion [140] not equal to that in zookeeper, skip updating ISR" errors).  

We have our producer "required acks" config set to wait for all (-1), and the 
min.insync.replicas set to 2.  I would have expected to be able to still be 
able to produce to the topic, but our producer (sarama) was getting timeouts.  
After restarting the broken broker, we were able to continue producing.

I confirmed that even after performing a graceful shutdown on 1 out of 3 
brokers, we are still able to produce since we have 2 out of 3 brokers still 
alive to serve produce and acknowledge produce requests.

> Cached zkVersion not equal to that in zookeeper, broker not recovering.
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-2729
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2729
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2.1
>            Reporter: Danil Serdyuchenko
>
> After a small network wobble where zookeeper nodes couldn't reach each other, 
> we started seeing a large number of undereplicated partitions. The zookeeper 
> cluster recovered, however we continued to see a large number of 
> undereplicated partitions. Two brokers in the kafka cluster were showing this 
> in the logs:
> {code}
> [2015-10-27 11:36:00,888] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for 
> partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 
> (kafka.cluster.Partition)
> [2015-10-27 11:36:00,891] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] 
> not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> {code}
> For all of the topics on the effected brokers. Both brokers only recovered 
> after a restart. Our own investigation yielded nothing, I was hoping you 
> could shed some light on this issue. Possibly if it's related to: 
> https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using 
> 0.8.2.1.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KAFKA-2729) Cached zkVersion not equal to that in zookeeper, broker not recovering.

Reply via email to