[ 
https://issues.apache.org/jira/browse/KAFKA-13564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhenChun Pan updated KAFKA-13564:
---------------------------------
    Description: 
The machine of broker0 dropped, and some partition change the leader to 
broker1. We can found message like below in state-change.log:

[2021-12-11 15:34:14,868] TRACE [Broker id=0] Cached leader info 
UpdateMetadataPartitionState(topicName='ceae-1002-flink-characteristic-instance-data',
 partitionIndex=0, controllerEpoch=3, leader=1, leaderEpoch=8, isr=[1], 
zkVersion****{*}#{*}#*****, offlineReplicas=[]) for partition 
ceae-1002-flink-characteristic-instance-data-0 in response to UpdateMetadata 
request sent by controller 2 epoch 3 with correlation id 0 (state.change.logger)

But we found server.log keep print logs like below:

[2021-12-11 15:34:30,272] INFO [ReplicaFetcher replicaId=0, leaderId=1, 
fetcherId=6] Retrying leaderEpoch request for partition 
ceae-1002-flink-characteristic-instance-data-0 as the leader reported an error: 
NOT_LEADER_OR_FOLLOWER (kafka.server.ReplicaFetcherThread)

And the producer also can not work and the client print messages below:

[2021-12-11 16:00:00,703] INFO [ReplicaFetcher replicaId=0, leaderId=1, 
fetcherId=4] Retrying leaderEpoch request for partition 
ceae-1002-flink-characteristic-instance-data-0 as the leader reported an error: 
NOT_LEADER_OR_FOLLOWER (kafka.server.ReplicaFetcherThread)

We resume broker0, but did not work. So we restart all brokers of kafka 
cluster, and fix the trouble.

  was:
The machine of broker0 dropped, and some partition change the leader to 
broker1. We can found message like below in state-change.log:

[2021-12-11 15:34:14,868] TRACE [Broker id=0] Cached leader info 
UpdateMetadataPartitionState(topicName='ceae-1002-flink-characteristic-instance-data',
 partitionIndex=0, controllerEpoch=3, leader=1, leaderEpoch=8, isr=[1], 
zkVersion*****#*#*****, offlineReplicas=[]) for partition 
ceae-1002-flink-characteristic-instance-data-0 in response to UpdateMetadata 
request sent by controller 2 epoch 3 with correlation id 0 (state.change.logger)

But we found server.log keep print logs like below:

[2021-12-11 15:34:30,272] INFO [ReplicaFetcher replicaId=0, leaderId=1, 
fetcherId=6] Retrying leaderEpoch request for partition 
ceae-1002-flink-characteristic-instance-data-0 as the leader reported an error: 
NOT_LEADER_OR_FOLLOWER (kafka.server.ReplicaFetcherThread)

And the producer also can not work and print messages below:

[2021-12-11 16:00:00,703] INFO [ReplicaFetcher replicaId=0, leaderId=1, 
fetcherId=4] Retrying leaderEpoch request for partition 
ceae-1002-flink-characteristic-instance-data-0 as the leader reported an error: 
NOT_LEADER_OR_FOLLOWER (kafka.server.ReplicaFetcherThread)

We resume broker0, but did not work. So we restart all brokers of kafka 
cluster, and fix the trouble.


> Kafka keep print NOT_LEADER_OR_FOLLOWER in log file after one broker dropped, 
> and the producer can not work.
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-13564
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13564
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: ZhenChun Pan
>            Priority: Major
>
> The machine of broker0 dropped, and some partition change the leader to 
> broker1. We can found message like below in state-change.log:
> [2021-12-11 15:34:14,868] TRACE [Broker id=0] Cached leader info 
> UpdateMetadataPartitionState(topicName='ceae-1002-flink-characteristic-instance-data',
>  partitionIndex=0, controllerEpoch=3, leader=1, leaderEpoch=8, isr=[1], 
> zkVersion****{*}#{*}#*****, offlineReplicas=[]) for partition 
> ceae-1002-flink-characteristic-instance-data-0 in response to UpdateMetadata 
> request sent by controller 2 epoch 3 with correlation id 0 
> (state.change.logger)
> But we found server.log keep print logs like below:
> [2021-12-11 15:34:30,272] INFO [ReplicaFetcher replicaId=0, leaderId=1, 
> fetcherId=6] Retrying leaderEpoch request for partition 
> ceae-1002-flink-characteristic-instance-data-0 as the leader reported an 
> error: NOT_LEADER_OR_FOLLOWER (kafka.server.ReplicaFetcherThread)
> And the producer also can not work and the client print messages below:
> [2021-12-11 16:00:00,703] INFO [ReplicaFetcher replicaId=0, leaderId=1, 
> fetcherId=4] Retrying leaderEpoch request for partition 
> ceae-1002-flink-characteristic-instance-data-0 as the leader reported an 
> error: NOT_LEADER_OR_FOLLOWER (kafka.server.ReplicaFetcherThread)
> We resume broker0, but did not work. So we restart all brokers of kafka 
> cluster, and fix the trouble.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to