ivankelly opened a new pull request #11688: URL: https://github.com/apache/pulsar/pull/11688
Lookups and PartitionMetadataRequests(PMR) will get TooManyRequest if the broker they are talking to is making a lot of async requests to ZK and thus accepting and parking a lot of requests. This happens if there is a herding of clients requesting a lot of different topics. However, if there are few topics, the result will be cached and served synchronously. If a herd occurs under these conditions, clients will get a timeout as the i/o handler threads could be saturated and unable to serve the request in time. In certain scenarios, such as when the service url is a VIP or load balancer, the retry request will hit the exact same endpoint. If the broker serving is overloaded, or just plain broken, the client will get the same result. Lookups and PMR can be served by any broker, so it makes sense after a certain number of failures to try another broker. Closing the connection achieves this. There is a setting, ClientBuilder#maxNumberOfRejectedRequestPerConnection to close the connection when a threshold of rejected requests is reached. Thus far this considered only requests that got TooManyRequests responses to be rejected. This change adds Timeout to the list of "rejected" responses. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
