Gaurav Narula created KAFKA-15823:
-------------------------------------

             Summary: NodeToControllerChannelManager: authentication error 
prevents controller update
                 Key: KAFKA-15823
                 URL: https://issues.apache.org/jira/browse/KAFKA-15823
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 3.5.1, 3.6.0
            Reporter: Gaurav Narula


NodeToControllerChannelManager caches the activeController address in an 
AtomicReference which is updated when:
 # activeController [has not been 
set|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L422]
 # networkClient [disconnnects from the 
controller|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L395C7-L395C7]
 # A node replies with 
`[Errors.NOT_CONTROLLER|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L408]`,
 and
 # When a controller changes from [Zk mode to Kraft 
mode|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L325]

 

When running multiple Kafka clusters in a dynamic environment, there is a 
chance that a controller's IP may get reassigned to another cluster's broker 
when the controller is bounced. In this scenario, the requests from Node to the 
Controller may fail with an AuthenticationException and are then retried 
indefinitely. This causes the node to get stuck as the new controller's 
information is never set.

 

A potential fix would be disconnect the network client and invoke 
`updateControllerAddress(null)` as we do in the `Errors.NOT_CONTROLLER` case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to