merlimat opened a new pull request #414:
URL: https://github.com/apache/pulsar-client-go/pull/414
### Motivation
There is a problem with the re-connection logic introduced in #157.
The change added a logic to keep retrying to establish a TCP connection with
broker up to the "operation timeout" (default 30seconds).
There are few issues with it:
1. (minor) It's not checking that the error is indeed a TCP error (eg: it
would retry on auth failures too)
2. (major) After a TCP connection failure, reconnecting to the same broker
is always the wrong approach, because the most likely outcome is that the next
attempt will also fail and, worse, the IP might just be unresponsive and we
will then have to wait for the full connection timeout time.
The correct solution after a connection failure is to re-do the topic
lookup, since the topic will be moving to a different broker and we need to
reconnect to the new broker asap.
The only time we can do this connection retry logic is for requests that are
not specific to a particular broker (eg: lookup operations). In this case a
quick retry on a connection failure will probably land the request on a
different, healthy, broker.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]