Hi Cheng,

socket.connection.setup.timeout.ms seems more consistent with our existing 
configuration names than socket.connections.setup.timeout.ms (with an s).  What 
do you think?

> If no connected or connecting node exists, provide the disconnected node which
> respects the reconnect backoff with the least number of failed attempts.

I think we need to rethink this part.  For example, if a new node joins the 
cluster, it will have 0 failed connect attempts, whereas the existing nodes 
will probably have more than 0.  So all the clients will ignore every other 
node and pile on to the new one.  That's not good.  I think we should just keep 
the existing random behavior.  If the node isn't blacklisted due to connection 
backoff, it should be fair game to be connected to.

On a related note, I think it would be good to have an exponential connection 
setup timeout backoff, similar to what we do with reconnect backoff.

Consider the case where we need to talk to the controller but it is not 
responding.  With the current proposal we will keep trying to reconnect every 
10 seconds.  That could lead to more reconnection attempts than what happens 
today.  In the rare case where the node is taking more than 10 seconds to 
process new connections, it will prevent us from connecting completely.

An exponential strategy could start at 10 seconds, then do 20, then 40, then 
80, up to some limit.  That would reduce the extra load and also handle the 
(hopefully very rare) case where connections are taking a long time to connect.

best,
Colin


On Fri, May 15, 2020, at 19:07, Cheng Tan wrote:
> Hello developers,
> 
> Big thanks for all the feedbacks. KIP-601 is finalized and ready for a vote. 
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-601%3A+Configurable+socket+connection+timeout+in+NetworkClient
>  
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-601:+Configurable+socket+connection+timeout+in+NetworkClient>
> 
> Best, - Cheng Tan

Reply via email to