[
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247802#comment-15247802
]
Flavio Junqueira commented on KAFKA-3042:
-----------------------------------------
[~junrao] It makes sense, thanks for the analysis. Trying to reconstruct the
problem in steps, this is what's going on:
# Broker 5 thinks broker 4 is alive and sends a LeaderAndIsr request to broker
1 with 4 as the leader.
# Broker 1 doesn't have 4 cached as a live broker, so it fails the request to
make it a follower of the partition.
The LeaderAndIsr request has a list of live leaders, and I suppose 4 is in that
list.
To sort this out, I can see two options:
# We simply update the metadata cache upon receiving a LeaderAndIsr request
using the list of live leaders. This update needs to be the union of the
current set with the set of leaders.
# You also suggested to send an UpdateMetadata request first to update the set
of love brokers.
I can't see any problem with 1, and I can't see any immediate problem with 2
either, but I'm concerned about finding ourselves with another race condition
if we send an update first. What do you think?
> updateIsr should stop after failed several times due to zkVersion issue
> -----------------------------------------------------------------------
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.8.2.1
> Environment: jdk 1.7
> centos 6.4
> Reporter: Jiahongchao
> Fix For: 0.10.0.0
>
> Attachments: controller.log, server.log.2016-03-23-01,
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's
> a follower.
> So after several failed tries, it need to find out who is the leader
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)