[ https://issues.apache.org/jira/browse/KAFKA-6916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Gustafson resolved KAFKA-6916. ------------------------------------ Resolution: Fixed > AdminClient does not refresh metadata on broker failure > ------------------------------------------------------- > > Key: KAFKA-6916 > URL: https://issues.apache.org/jira/browse/KAFKA-6916 > Project: Kafka > Issue Type: Task > Components: admin > Affects Versions: 1.1.0, 1.0.1 > Reporter: Rajini Sivaram > Assignee: Rajini Sivaram > Priority: Major > Fix For: 2.0.0 > > > There are intermittent test failures in DynamicBrokerReconfigurationTest when > brokers are restarted. The test uses ephemeral ports and hence ports after > server restart are not the same as the ports before restart. The tests rely > on metadata refresh on producers, consumers and admin clients to obtain new > server ports when connections fail. This works with producers and consumers, > but results in intermittent failures with admin client because refresh is not > triggered. > There are a couple of issues in AdminClient: > # Unlike producers and consumers, adminClient does not request metadata > update when connection to a broker fails. This is particularly bad if > controller goes down. Controller is used for various requests like > createTopics and describeTopics. If controller goes down and > adminClient.describeTopics() is invoked, adminClient sends the request to the > old controller. If the connection fails, it keeps retrying with the same > address. Metadata refresh is never triggered. The request times out after 2 > minutes by default, metadata is not refreshed for 5 minutes by default. We > should refresh metadata whenever connection to a broker fails. > # Admin client requests are always retried on the same node. In the example > above, if controller goes down and a new controller is elected, it will be > good if the retried request is sent to the new controller. Otherwise we are > just blocking the call for 2 minutes with a lot of retries that would never > succeed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)