[
https://issues.apache.org/jira/browse/KAFKA-15828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Schofield reassigned KAFKA-15828:
----------------------------------------
Assignee: Andrew Schofield
> Protect clients from broker hostname reuse
> ------------------------------------------
>
> Key: KAFKA-15828
> URL: https://issues.apache.org/jira/browse/KAFKA-15828
> Project: Kafka
> Issue Type: Bug
> Components: clients, consumer, producer
> Reporter: Jason Gustafson
> Assignee: Andrew Schofield
> Priority: Major
> Labels: needs-kip
>
> In some environments such as k8s, brokers may be assigned to nodes
> dynamically from an available pool. When a cluster is rolling, it is possible
> for the client to see the same node advertised for different broker IDs in a
> short period of time. For example, kafka-1 might be initially assigned to
> node1. Before the client is able to establish a connection, it could be that
> kafka-3 is now on node1 instead. Currently there is no protection in the
> client or in the protocol for this scenario. If the connection succeeds, the
> client will assume it has a good connection to kafka-1. Until something
> disrupts the connection, it will continue under this assumption even if the
> hostname for kafka-1 changes.
> We have observed this scenario in practice. The client connected to the wrong
> broker through stale hostname information. It was unable to produce data
> because of persistent NOT_LEADER errors. The only way to recover in the end
> was by restarting the client to force a reconnection.
> We have discussed a couple potential solutions to this problem:
> # Let the client be smarter managing the connection/hostname mapping. When
> it detects that a hostname has changed, it should force a disconnect to
> ensure it connects to the right node.
> # We can modify the protocol to verify that the client has connected to the
> intended broker. For example, we can add a field to ApiVersions to indicate
> the intended broker ID. The broker receiving the request can return an error
> if its ID does not match that in the request.
> Are there alternatives?
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)