[ 
https://issues.apache.org/jira/browse/KAFKA-6916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-6916.
------------------------------------
    Resolution: Fixed

> AdminClient does not refresh metadata on broker failure
> -------------------------------------------------------
>
>                 Key: KAFKA-6916
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6916
>             Project: Kafka
>          Issue Type: Task
>          Components: admin
>    Affects Versions: 1.1.0, 1.0.1
>            Reporter: Rajini Sivaram
>            Assignee: Rajini Sivaram
>            Priority: Major
>             Fix For: 2.0.0
>
>
> There are intermittent test failures in DynamicBrokerReconfigurationTest when 
> brokers are restarted. The test uses ephemeral ports and hence ports after 
> server restart are not the same as the ports before restart. The tests rely 
> on metadata refresh on producers, consumers and admin clients to obtain new 
> server ports when connections fail. This works with producers and consumers, 
> but results in intermittent failures with admin client because refresh is not 
> triggered.
> There are a couple of issues in AdminClient:
>  # Unlike producers and consumers, adminClient does not request metadata 
> update when connection to a broker fails. This is particularly bad if 
> controller goes down. Controller is used for various requests like 
> createTopics and describeTopics. If controller goes down and 
> adminClient.describeTopics() is invoked, adminClient sends the request to the 
> old controller. If the connection fails, it keeps retrying with the same 
> address. Metadata refresh is never triggered. The request times out after 2 
> minutes by default, metadata is not refreshed for 5 minutes by default. We 
> should refresh metadata whenever connection to a broker fails.
>  # Admin client requests are always retried on the same node. In the example 
> above, if controller goes down and a new controller is elected, it will be 
> good if the retried request is sent to the new controller. Otherwise we are 
> just blocking the call for 2 minutes with a lot of retries that would never 
> succeed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to