Hi Cheng, socket.connection.setup.timeout.ms seems more consistent with our existing configuration names than socket.connections.setup.timeout.ms (with an s). What do you think?
> If no connected or connecting node exists, provide the disconnected node which > respects the reconnect backoff with the least number of failed attempts. I think we need to rethink this part. For example, if a new node joins the cluster, it will have 0 failed connect attempts, whereas the existing nodes will probably have more than 0. So all the clients will ignore every other node and pile on to the new one. That's not good. I think we should just keep the existing random behavior. If the node isn't blacklisted due to connection backoff, it should be fair game to be connected to. On a related note, I think it would be good to have an exponential connection setup timeout backoff, similar to what we do with reconnect backoff. Consider the case where we need to talk to the controller but it is not responding. With the current proposal we will keep trying to reconnect every 10 seconds. That could lead to more reconnection attempts than what happens today. In the rare case where the node is taking more than 10 seconds to process new connections, it will prevent us from connecting completely. An exponential strategy could start at 10 seconds, then do 20, then 40, then 80, up to some limit. That would reduce the extra load and also handle the (hopefully very rare) case where connections are taking a long time to connect. best, Colin On Fri, May 15, 2020, at 19:07, Cheng Tan wrote: > Hello developers, > > Big thanks for all the feedbacks. KIP-601 is finalized and ready for a vote. > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-601%3A+Configurable+socket+connection+timeout+in+NetworkClient > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-601:+Configurable+socket+connection+timeout+in+NetworkClient> > > Best, - Cheng Tan