[
https://issues.apache.org/jira/browse/KAFKA-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500542#comment-16500542
]
Anil commented on KAFKA-6281:
-----------------------------
For my topic, leader is node 1.
During the same time I got this exception again , I see in the
server-state-change log, that the Leader has been changed to "Node 2". Is it
due to Node 1 going down ?.
If yes, is the same reason why I got exception "This server is not the leader
for that partition " from Producer ?. Here the leader is actually node 1, but
some how, for the node 2 there's a wrong signal sent, saying that node is the
Leader. Please clarify on this.
_--Exception : 2 --_
_-2018-06-04T10:15:15,522 [kafka-producer-network-thread | producer-1] ERROR
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is
not the leader for that topic-partition._
{color:#FF0000}stage-change.log from node 1 :{color}
_*[2018-06-04 10:14:49,272] TRACE Broker 1 received LeaderAndIsr request
PartitionState(controllerEpoch=43, leader=2, leaderEpoch=23, isr=[2],
zkVersion=568, replicas=[1, 2]) correlation id 1 from controller 2 epoch 43 for
partition [MY_TOPIC_NAME,0] (state.change.logger)*_
_[2018-06-04 10:14:49,273] TRACE Broker 1 handling LeaderAndIsr request
correlationId 1 from controller 2 epoch 43 starting the become-follower
transition for partition [MY_TOPIC_NAME,0] (state.change.logger)_
_[2018-06-04 10:14:49,277] TRACE Broker 1 stopped fetchers as part of
become-follower request from controller 2 epoch 43 with correlation id 1 for
partition [MY_TOPIC_NAME,0] (state.change.logger)_
_[2018-06-04 10:14:49,282] TRACE Broker 1 truncated logs and checkpointed
recovery boundaries for partition [MY_TOPIC_NAME,0] as part of become-follower
request with correlation id 1 from controller 2 epoch 43 (state.change.logger)_
_[2018-06-04 10:14:49,284] TRACE Broker 1 started fetcher to new leader as part
of become-follower request from controller 2 epoch 43 with correlation id 1 for
partition [MY_TOPIC_NAME,0] (state.change.logger)_
_[2018-06-04 10:14:49,286] TRACE Broker 1 completed LeaderAndIsr request
correlationId 1 from controller 2 epoch 43 for the become-follower transition
for partition [MY_TOPIC_NAME,0] (state.change.logger)_
{color:#FF0000}stage-change.log from node 2 :{color}
_[2018-06-04 10:14:48,562] TRACE Controller 2 epoch 43 changed partition
[MY_TOPIC_NAME,0] state from OnlinePartition to OfflinePartition
(state.change.logger)_
_[2018-06-04 10:14:48,753] TRACE Controller 2 epoch 43 started leader election
for partition [MY_TOPIC_NAME,0] (state.change.logger)_
_[2018-06-04 10:14:48,757] TRACE Controller 2 epoch 43 elected leader 2 for
Offline partition [MY_TOPIC_NAME,0] (state.change.logger)_
_[2018-06-04 10:14:48,757] TRACE Controller 2 epoch 43 changed partition
[MY_TOPIC_NAME,0] from OfflinePartition to OnlinePartition with leader 2
(state.change.logger)_
_[2018-06-04 10:14:48,795] TRACE Controller 2 epoch 43 sending become-leader
LeaderAndIsr request (Leader:2,ISR:2,LeaderEpoch:23,ControllerEpoch:43) to
broker 2 for partition [MY_TOPIC_NAME,0] (state._
_change.logger)_
_[2018-06-04 10:14:48,796] TRACE Broker 2 received LeaderAndIsr request
PartitionState(controllerEpoch=43, leader=2, leaderEpoch=23, isr=[2],
zkVersion=568, replicas=[1, 2]) correlation id 341 from_
_controller 2 epoch 43 for partition [MY_TOPIC_NAME,0] (state.change.logger)_
_[2018-06-04 10:14:48,796] TRACE Broker 2 handling LeaderAndIsr request
correlationId 341 from controller 2 epoch 43 starting the become-leader
transition for partition [MY_TOPIC_NAME,0] (state.change_
_.logger)_
_[2018-06-04 10:14:48,798] TRACE Broker 2 stopped fetchers as part of
become-leader request from controller 2 epoch 43 with correlation id 341 for
partition [MY_TOPIC_NAME,0] (state.change.logger)_
_[2018-06-04 10:14:48,798] TRACE Broker 2 completed LeaderAndIsr request
correlationId 341 from controller 2 epoch 43 for the become-leader transition
for partition [MY_TOPIC_NAME,0] (state.change.log_
_ger)_
_[2018-06-04 10:14:48,976] TRACE Controller 2 epoch 43 changed state of replica
1 for partition [MY_TOPIC_NAME,0] from OnlineReplica to OfflineReplica
(state.change.logger)_
_[2018-06-04 10:14:49,251] TRACE Controller 2 epoch 43 sending become-leader
LeaderAndIsr request (Leader:2,ISR:2,LeaderEpoch:23,ControllerEpoch:43) to
broker 2 for partition [MY_TOPIC_NAME,0] (state._
_change.logger)_
_[2018-06-04 10:14:49,261] TRACE Controller 2 epoch 43 changed state of replica
1 for partition [MY_TOPIC_NAME,0] from OfflineReplica to OnlineReplica
(state.change.logger)_
_[2018-06-04 10:14:49,267] TRACE Controller 2 epoch 43 sending become-follower
LeaderAndIsr request (Leader:2,ISR:2,LeaderEpoch:23,ControllerEpoch:43) to
broker 1 for partition [MY_TOPIC_NAME,0] (stat_
_e.change.logger)_
_[2018-06-04 10:14:49,292] TRACE Broker 2 received LeaderAndIsr request
PartitionState(controllerEpoch=43, leader=2, leaderEpoch=23, isr=[2],
zkVersion=568, replicas=[1, 2]) correlation id 343 from_
_controller 2 epoch 43 for partition [MY_TOPIC_NAME,0] (state.change.logger)_
_[2018-06-04 10:14:49,293] WARN Broker 2 ignoring LeaderAndIsr request from
controller 2 with correlation id 343 epoch 43 for partition [MY_TOPIC_NAME,0]
since its associated leader epoch 23 is not hi_
_gher than the current leader epoch 23 (state.change.logger)_
> Kafka JavaAPI Producer failed with NotLeaderForPartitionException
> -----------------------------------------------------------------
>
> Key: KAFKA-6281
> URL: https://issues.apache.org/jira/browse/KAFKA-6281
> Project: Kafka
> Issue Type: Bug
> Reporter: Anil
> Priority: Major
> Attachments: server1-controller.log, server2-controller.log
>
>
> We are running Kafka (vesion kafka_2.11-0.10.1.0) in a 2 node cluster. We
> have 2 producers (Java API) acting on different topics. Each topic has single
> partition. The topic where we had this issue, has one consumer running. This
> set up has been running fine for 3 months, and we saw this issue. All the
> suggested cases/solutions for this issue in other forums don't seem to apply
> for my scenario.
> Exception at producer;
> {code}
> -2017-11-25T17:40:33,035 [kafka-producer-network-thread | producer-1] ERROR
> client.producer.BingLogProducerCallback - Encountered exception in sending
> message ; > org.apache.kafka.common.errors.NotLeaderForPartitionException:
> This server is not the leader for that topic-partition.
> {code}
> We haven't enabled retries for the messages, because this is transactional
> data and we want to maintain the order.
> Producer config:
> {code}
> bootstrap.servers : server1ip:9092
> acks :all
> retries : 0
> linger.ms :0
> buffer.memory :10240000
> max.request.size :1024000
> key.serializer : org.apache.kafka.common.serialization.StringSerializer
> value.serializer : org.apache.kafka.common.serialization.StringSerializer
> {code}
> We are connecting to server1 at both producer and consumer. The controller
> log at server2 indicates there is some shutdown happened at during sametime,
> but I dont understand why this happened.
> {color:red}[2017-11-25 17:31:44,776] DEBUG [Controller 2]: topics not in
> preferred replica Map() (kafka.controller.KafkaController) [2017-11-25
> 17:31:44,776] TRACE [Controller 2]: leader imbalance ratio for broker 2 is
> 0.000000 (kafka.controller.KafkaController) [2017-11-25 17:31:44,776] DEBUG
> [Controller 2]: topics not in preferred replica Map()
> (kafka.controller.KafkaController) [2017-11-25 17:31:44,776] TRACE
> [Controller 2]: leader imbalance ratio for broker 1 is 0.000000
> (kafka.controller.KafkaController) [2017-11-25 17:34:18,314] INFO
> [SessionExpirationListener on 2], ZK expired; shut down all controller
> components and try to re-elect
> (kafka.controller.KafkaController$SessionExpirationListener) [2017-11-25
> 17:34:18,317] DEBUG [Controller 2]: Controller resigning, broker id 2
> (kafka.controller.KafkaController) [2017-11-25 17:34:18,317] DEBUG
> [Controller 2]: De-registering IsrChangeNotificationListener
> (kafka.controller.KafkaController) [2017-11-25 17:34:18,317] INFO
> [delete-topics-thread-2], Shutting down
> (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2017-11-25
> 17:34:18,317] INFO [delete-topics-thread-2], Stopped
> (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2017-11-25
> 17:34:18,318] INFO [delete-topics-thread-2], Shutdown completed
> (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2017-11-25
> 17:34:18,318] INFO [Partition state machine on Controller 2]: Stopped
> partition state machine (kafka.controller.PartitionStateMachine) [2017-11-25
> 17:34:18,318] INFO [Replica state machine on controller 2]: Stopped replica
> state machine (kafka.controller.ReplicaStateMachine) [2017-11-25
> 17:34:18,318] INFO [Controller-2-to-broker-2-send-thread], Shutting down
> (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,318] INFO
> [Controller-2-to-broker-2-send-thread], Stopped
> (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO
> [Controller-2-to-broker-2-send-thread], Shutdown completed
> (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO
> [Controller-2-to-broker-1-send-thread], Shutting down
> (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO
> [Controller-2-to-broker-1-send-thread], Stopped
> (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO
> [Controller-2-to-broker-1-send-thread], Shutdown completed
> (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO
> [Controller 2]: Broker 2 resigned as the controller
> (kafka.controller.KafkaController) [2017-11-25 17:34:18,353] DEBUG
> [IsrChangeNotificationListener] Fired!!!
> (kafka.controller.IsrChangeNotificationListener) [2017-11-25 17:34:18,353]
> DEBUG [IsrChangeNotificationListener] Fired!!!
> (kafka.controller.IsrChangeNotificationListener) [2017-11-25 17:34:18,354]
> INFO [BrokerChangeListener on Controller 2]: Broker change listener fired for
> path /brokers/ids with children 1,2
> (kafka.controller.ReplicaStateMachine$BrokerChangeListener) [2017-11-25
> 17:34:18,355] DEBUG [DeleteTopicsListener on 2]: Delete topics listener fired
> for topics to be deleted
> (kafka.controller.PartitionStateMachine$DeleteTopicsListener) [2017-11-25
> 17:34:18,362] INFO [AddPartitionsListener on 2]: Partition modification
> triggered {"version":1,"partitions":{"0":[1]}} for path /brokers/topics/ESQ
> (kafka.controller.PartitionStateMachine$PartitionModificationsListener)
> [2017-11-25 17:34:18,368] INFO [AddPartitionsListener on 2]: Partition
> modification triggered {"version":1,"partitions":{"0":[1]}} for path
> /brokers/topics/Test1
> (kafka.controller.PartitionStateMachine$PartitionModificationsListener)
> [2017-11-25 17:34:18,369] INFO [AddPartitionsListener on 2]: Partition
> modification triggered {"version":1,"partitions":{"0":[2]}} for path
> /brokers/topics/ImageQ
> (kafka.controller.PartitionStateMachine$PartitionModificationsListener)
> [2017-11-25 17:34:18,374] INFO [AddPartitionsListener on 2]: Partition
> modification triggered
> {"version":1,"partitions":{"8":[1,2],"4":[1,2],"9":[2,1],"5":[2,1],"6":[1,2],"1":[2,1],"0":[1,2],"2":[1,2],"7":[2,1],"3":[2,1]}}
> for path /brokers/topics/NMS_NotifyQ
> (kafka.controller.PartitionStateMachine$PartitionModificationsListener)
> [2017-11-25 17:34:18,375] INFO [AddPartitionsListener on 2]: Partition
> modification triggered {"version":1,"partitions":{"0":[1]}} for path
> /brokers/topics/TempBinLogReqQ @{color}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)