[
https://issues.apache.org/jira/browse/KAFKA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stanislav Kozlovski updated KAFKA-15823:
----------------------------------------
Fix Version/s: 3.8.0
(was: 3.7.0)
> NodeToControllerChannelManager: authentication error prevents controller
> update
> -------------------------------------------------------------------------------
>
> Key: KAFKA-15823
> URL: https://issues.apache.org/jira/browse/KAFKA-15823
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 3.6.0, 3.5.1
> Reporter: Gaurav Narula
> Priority: Major
> Fix For: 3.8.0
>
>
> NodeToControllerChannelManager caches the activeController address in an
> AtomicReference which is updated when:
> # activeController [has not been
> set|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L422]
> # networkClient [disconnnects from the
> controller|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L395C7-L395C7]
> # A node replies with
> `[Errors.NOT_CONTROLLER|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L408]`,
> and
> # When a controller changes from [Zk mode to Kraft
> mode|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L325]
>
> When running multiple Kafka clusters in a dynamic environment, there is a
> chance that a controller's IP may get reassigned to another cluster's broker
> when the controller is bounced. In this scenario, the requests from Node to
> the Controller may fail with an AuthenticationException and are then retried
> indefinitely. This causes the node to get stuck as the new controller's
> information is never set.
>
> A potential fix would be disconnect the network client and invoke
> `updateControllerAddress(null)` as we do in the `Errors.NOT_CONTROLLER` case.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)