Jason Gustafson created KAFKA-13944:
---------------------------------------
Summary: Shutting down broker can be elected as partition leader
in KRaft
Key: KAFKA-13944
URL: https://issues.apache.org/jira/browse/KAFKA-13944
Project: Kafka
Issue Type: Bug
Reporter: Jason Gustafson
When a broker requests shutdown, it transitions to the CONTROLLED_SHUTDOWN
state in the controller. It is possible for the broker to remain unfenced in
this state until the controlled shutdown completes. When doing an election, the
only thing we generally check is that the broker is unfenced, so this means we
can elect a broker that is in controlled shutdown.
Here are a few snippets from a recent system test in which this occurred:
{code:java}
// broker 2 starts controlled shutdown
[2022-05-26 21:17:26,451] INFO [Controller 3001] Unfenced broker 2 has
requested and been granted a controlled shutdown.
(org.apache.kafka.controller.BrokerHeartbeatManager)
// there is only one replica, so we set leader to -1
[2022-05-26 21:17:26,452] DEBUG [Controller 3001] partition change for _foo-1
with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: 2 -> -1, leaderEpoch: 0 -> 1,
partitionEpoch: 0 -> 1 (org.apache.kafka.controller.ReplicationControlManager)
// controlled shutdown cannot complete immediately
[2022-05-26 21:17:26,529] DEBUG [Controller 3001] The request from broker 2 to
shut down can not yet be granted because the lowest active offset 177 is not
greater than the broker's shutdown offset 244.
(org.apache.kafka.controller.BrokerHeartbeatManager)
[2022-05-26 21:17:26,530] DEBUG [Controller 3001] Updated the controlled
shutdown offset for broker 2 to 244.
(org.apache.kafka.controller.BrokerHeartbeatManager)
// later on we elect leader 2 again
[2022-05-26 21:17:27,703] DEBUG [Controller 3001] partition change for _foo-1
with topic ID _iUQ72T_R4mmZgI3WrsyXw: leader: -1 -> 2, leaderEpoch: 1 -> 2,
partitionEpoch: 1 -> 2 (org.apache.kafka.controller.ReplicationControlManager)
// now controlled shutdown is stuck because of the newly elected leader
[2022-05-26 21:17:28,531] DEBUG [Controller 3001] Broker 2 is in controlled
shutdown state, but can not shut down because more leaders still need to be
moved. (org.apache.kafka.controller.BrokerHeartbeatManager)
{code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)