[
https://issues.apache.org/jira/browse/KAFKA-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ismael Juma resolved KAFKA-6879.
--------------------------------
Resolution: Fixed
> Controller deadlock following session expiration
> ------------------------------------------------
>
> Key: KAFKA-6879
> URL: https://issues.apache.org/jira/browse/KAFKA-6879
> Project: Kafka
> Issue Type: Bug
> Components: controller
> Affects Versions: 1.1.0
> Reporter: Jason Gustafson
> Assignee: Jason Gustafson
> Priority: Critical
> Fix For: 2.0.0, 1.1.1
>
>
> We have observed an apparent deadlock situation which occurs following a
> session expiration. The suspected deadlock occurs between the zookeeper
> "initializationLock" and the latch inside the Expire event which we use to
> ensure all events have been handled.
> In the logs, we see the "Session expired" message following acquisition of
> the initialization lock:
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/zookeeper/ZooKeeperClient.scala#L358
> But we never see any logs indicating that the new session is being
> initialized. In fact, the controller logs are basically empty from that point
> on. The problem we suspect is that completion of the
> {{beforeInitializingSession}} callback requires that all events have finished
> processing in order to count down the latch:
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L1525.
> But an event which was dequeued just prior to the acquisition of the write
> lock may be unable to complete because it is awaiting acquisition of the
> initialization lock:
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/zookeeper/ZooKeeperClient.scala#L137.
> The impact is that the broker continues in a zombie state. It continues
> fetching and is periodically added to ISRs, but it never receives any further
> requests from the controller since it is not registered.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)