[
https://issues.apache.org/jira/browse/KAFKA-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241314#comment-15241314
]
Flavio Junqueira commented on KAFKA-3042:
-----------------------------------------
[~junrao] In this comment:
https://issues.apache.org/jira/browse/KAFKA-3042?focusedCommentId=15236055&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15236055
I showed that broker 5 is the one that sent the LeaderAndIsr request to broker
1, and in here:
https://issues.apache.org/jira/browse/KAFKA-3042?focusedCommentId=15237383&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15237383
that broker 5 also didn't have broker 4 as a live broker when it sent the
request to broker 1. It does sound right that the controller on failover should
update the list of live brokers on other brokers before sending requests that
make them followers or at least the problem should be transient in the sense
that it could be corrected with a later message. However, it sounds like for
the partition we are analyzing, there is this additional problem that
controller 5 also didn't have broker 4 in its list of live brokers.
Interestingly, I also caught an instance of this:
{noformat}
[2016-04-09 00:37:54,111] DEBUG Sending MetadataRequest to
Brokers:ArrayBuffer(2, 5)...
[2016-04-09 00:37:54,111] ERROR Haven't been able to send metadata update
requests...
[2016-04-09 00:37:54,112] ERROR [Controller 5]: Forcing the controller to
resign (kafka.controller.KafkaController)
{noformat}
I don't think this is related, but we have been wondering in another issue
about the possible causes of batches in {{ControllerBrokerRequestBatch}} not
being empty, and there are a few occurrences of it in these logs. This is
happening, however, right after the controller resigns, so I'm guessing this is
related to the controller shutting down:
{noformat}
[2016-04-09 00:37:54,064] INFO [Controller 5]: Broker 5 resigned as the
controller (kafka.controller.KafkaController)
{noformat}
In any case, for this last issue, I'll create a jira to make sure that we have
enough info to identify this issue when it happens. Currently, the exception is
being propagated, but nowhere we are logging the cause.
> updateIsr should stop after failed several times due to zkVersion issue
> -----------------------------------------------------------------------
>
> Key: KAFKA-3042
> URL: https://issues.apache.org/jira/browse/KAFKA-3042
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.8.2.1
> Environment: jdk 1.7
> centos 6.4
> Reporter: Jiahongchao
> Fix For: 0.10.0.0
>
> Attachments: controller.log, server.log.2016-03-23-01,
> state-change.log
>
>
> sometimes one broker may repeatly log
> "Cached zkVersion 54 not equal to that in zookeeper, skip updating ISR"
> I think this is because the broker consider itself as the leader in fact it's
> a follower.
> So after several failed tries, it need to find out who is the leader
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)