[ https://issues.apache.org/jira/browse/KAFKA-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086717#comment-15086717 ]
Jun Rao commented on KAFKA-3068: -------------------------------- There are 2 possible ways of configuring bootstrap servers: (1) Using a VIP on a load balancer. In the case, we can expect the VIP to map to the current live brokers. (2) Using a list of broker hosts. In this case, it's the user's responsibility to make sure the list is up to date and contains at least one live broker. When we hit a case that none of the current brokers (from last metadata response) is connectable (e.g., the cluster is shrunk to 1 node, then that node dies and other brokers are restarted), falling back to the bootstrap servers will help if option (1) is used since the VIP will allow us to connect to a live broker. If option (2) is used, falling back to the bootstrap servers may not help if none of the bootstrap servers is reachable. However, in this case, it's really the user's responsibility to re-configure bootstrap servers and restart the producer. So, overall, it seems that falling back to bootstrap servers when all existing connections are gone will help. Now, on caching old brokers long than the metadata refresh interval. Currently, we can say if you ever want to reuse a server in a different Kafka cluster, wait for at least the metadata refresh interval after taking the broker down. If we cache the old brokers longer, this reasoning will be more complicated. Also, from the above, I am not sure if old brokers are more useful than the configured bootstrap servers. Finally, we discussed to fall back to the bootstrap servers in KAFKA-1303 for a different scenario early on, but didn't pursue that in the end. > NetworkClient may connect to a different Kafka cluster than originally > configured > --------------------------------------------------------------------------------- > > Key: KAFKA-3068 > URL: https://issues.apache.org/jira/browse/KAFKA-3068 > Project: Kafka > Issue Type: Bug > Components: clients > Affects Versions: 0.9.0.0 > Reporter: Jun Rao > > In https://github.com/apache/kafka/pull/290, we added the logic to cache all > brokers (id and ip) that the client has ever seen. If we can't find an > available broker from the current Metadata, we will pick a broker that we > have ever seen (in NetworkClient.leastLoadedNode()). > One potential problem this logic can introduce is the following. Suppose that > we have a broker with id 1 in a Kafka cluster. A producer client remembers > this broker in nodesEverSeen. At some point, we bring down this broker and > use the host in a different Kafka cluster. Then, the producer client uses > this broker from nodesEverSeen to refresh metadata. It will find the metadata > in a different Kafka cluster and start producing data there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)