merlimat opened a new pull request #414:
URL: https://github.com/apache/pulsar-client-go/pull/414


   ### Motivation
   
   There is a problem with the re-connection logic introduced in #157. 
   
   The change added a logic to keep retrying to establish a TCP connection with 
broker up to the "operation timeout" (default 30seconds). 
   
   There are few issues with it: 
    1. (minor) It's not checking that the error is indeed a TCP error (eg: it 
would retry on auth failures too)
    2. (major) After a TCP connection failure, reconnecting to the same broker 
is always the wrong approach, because the most likely outcome is that the next 
attempt will also fail and, worse, the IP might just be unresponsive and we 
will then have to wait for the full connection timeout time. 
   
   The correct solution after a connection failure is to re-do the topic 
lookup, since the topic will be moving to a different broker and we need to 
reconnect to the new broker asap. 
   
   The only time we can do this connection retry logic is for requests that are 
not specific to a particular broker (eg: lookup operations). In this case a 
quick retry on a connection failure will probably land the request on a 
different, healthy, broker.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to