Luke Chen created KAFKA-13653:
---------------------------------
Summary: Proactively discover alive brokers from bootstrap server
lists when all nodes are down
Key: KAFKA-13653
URL: https://issues.apache.org/jira/browse/KAFKA-13653
Project: Kafka
Issue Type: Improvement
Components: clients
Affects Versions: 3.1.0
Reporter: Luke Chen
Currently, metadata update has 2 ways:
# partition leader change
# metadata expired (default 5 mins)
But sometimes, we will start the client and the brokers at the same time. The
client might discover only partial of the brokers at first. And when the
discovered brokers down accidentally within 5 mins (before metadata expired),
there would be no chance to update the metadata and all cluster is down.
Ex:
1. brokerA is up
2. producer is up, discovered brokerA, update its metadata
3. brokerB, brokerC are up (but producer doesn't know, and leader imbalance
check is not expired (5 mins default))
4. producer keeps producing data without error
5. brokerA down, let's say, in 3 mins after producer started
6. Now, all cluster won't work even though brokerB and brokerC are up
We should proactively discover active brokers when there are no nodes to
connect via the bootstrap server config. So, in the above example, if the
bootstrap.server is set to "brokerA_IP,brokerB_IP,brokerC_IP", then we should
be able to discover the brokerB and brokerC after step 6.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)