[ https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947500#comment-15947500 ]
Sam Nguyen commented on KAFKA-2729: ----------------------------------- We ran into this today on kafka_2.11-0.10.0.1. There is unexpected behavior with regards to partition availability. One out of 3 total brokers in our cluster entered this state (emitting "Cached zkVersion [140] not equal to that in zookeeper, skip updating ISR" errors). We have our producer "required acks" config set to wait for all (-1), and the min.insync.replicas set to 2. I would have expected to be able to still be able to produce to the topic, but our producer (sarama) was getting timeouts. After restarting the broken broker, we were able to continue producing. I confirmed that even after performing a graceful shutdown on 1 out of 3 brokers, we are still able to produce since we have 2 out of 3 brokers still alive to serve produce and acknowledge produce requests. > Cached zkVersion not equal to that in zookeeper, broker not recovering. > ----------------------------------------------------------------------- > > Key: KAFKA-2729 > URL: https://issues.apache.org/jira/browse/KAFKA-2729 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8.2.1 > Reporter: Danil Serdyuchenko > > After a small network wobble where zookeeper nodes couldn't reach each other, > we started seeing a large number of undereplicated partitions. The zookeeper > cluster recovered, however we continued to see a large number of > undereplicated partitions. Two brokers in the kafka cluster were showing this > in the logs: > {code} > [2015-10-27 11:36:00,888] INFO Partition > [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for > partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 > (kafka.cluster.Partition) > [2015-10-27 11:36:00,891] INFO Partition > [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] > not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition) > {code} > For all of the topics on the effected brokers. Both brokers only recovered > after a restart. Our own investigation yielded nothing, I was hoping you > could shed some light on this issue. Possibly if it's related to: > https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using > 0.8.2.1. -- This message was sent by Atlassian JIRA (v6.3.15#6346)