On Mon, May 18, 2020, at 14:41, Cheng Tan wrote:
> Dear Colin,
> 
> 
> Thanks for the suggestions.
> 
> > For example, if a new node joins the cluster, it will have 0 failed connect 
> > attempts, whereas the existing nodes will probably have more than 0.  So 
> > all the clients will ignore every other node and pile on to the new one.  
> > That's not good
> 
> 
> The existing behavior is not random when there’s no connected or 
> connected node. leastLoadeNode() will always provide the node respect 
> the connection backoff with the largest array index in the cached node 
> list. The shuffle only happens after metadata fetch. Thus, when the 
> client is not able to fetch metadata, the cached node won’t get 
> shuffled. So I proposed to consider the failed attempts together with 
> the connection backoff. 
> 
> The potential issue you mentioned make sense. I can think about an 
> alternative way which is to randomly pick a disconnected node which 
> respect the connection backoff.
> 
> > Consider the case where we need to talk to the controller but it is not 
> > responding.  With the current proposal we will keep trying to reconnect 
> > every 10 seconds.  That could lead to more reconnection attempts than what 
> > happens today.  In the rare case where the node is taking more than 10 
> > seconds to process new connections, it will prevent us from connecting 
> > completely.
> 
> Exponential timeout make sense. I also have some thoughts about the 
> parameter tuning. Since Java NIO will timeout and retry the socket 
> channel connection exponentially after 1s, 2s, 4s, 8s, …, we’d better 
> to make the default value as a exp of 2 since the sum of the timeout by 
> Java NIO is 2^x  - 1. 
> 
> For example, if the socket.connection.setup.timeout = 10, Java NIO will 
> only get a chance to try a maximum timeout 4 since 1 + 2 + 4 = 7 and 
> the last try is less than 3s, which is useless. However, if we set the 
> socket.connection.setup.timeout = 8 or 16, the last try won’t get 
> wasted since 1 + 2 + 4 = 7 and 1 + 2 + 4 + 8 = 15.
> 
> 
> Please let me know what you think. Thanks.

Hi Cheng,

It sounds like we agree that exponential is better.  Maybe check out what we do 
for reconnect backoff to see one possible way to set it up (with a minimum 
timeout, and maximum timeout, and we keep doubling until we hit the maximum).

I didn't follow the comment about why a power of two is better.  The attempt to 
connect the TCP is handled by the operating system, not by Java, right?  Java 
NIO doesn't dictate how long we should wait before terminating the attempt to 
connect.  Hope I didn't miss anything.

best,
Colin


> 
> Best, - Cheng Tan
> 
> 
> 
> > On May 18, 2020, at 1:32 PM, Colin McCabe <cmcc...@apache.org> wrote:
> > 
> > Hi Cheng,
> > 
> > socket.connection.setup.timeout.ms seems more consistent with our existing 
> > configuration names than socket.connections.setup.timeout.ms (with an s).  
> > What do you think?
> > 
> >> If no connected or connecting node exists, provide the disconnected node 
> >> which
> >> respects the reconnect backoff with the least number of failed attempts.
> > 
> > I think we need to rethink this part.  For example, if a new node joins the 
> > cluster, it will have 0 failed connect attempts, whereas the existing nodes 
> > will probably have more than 0.  So all the clients will ignore every other 
> > node and pile on to the new one.  That's not good.  I think we should just 
> > keep the existing random behavior.  If the node isn't blacklisted due to 
> > connection backoff, it should be fair game to be connected to.
> > 
> > On a related note, I think it would be good to have an exponential 
> > connection setup timeout backoff, similar to what we do with reconnect 
> > backoff.
> > 
> > Consider the case where we need to talk to the controller but it is not 
> > responding.  With the current proposal we will keep trying to reconnect 
> > every 10 seconds.  That could lead to more reconnection attempts than what 
> > happens today.  In the rare case where the node is taking more than 10 
> > seconds to process new connections, it will prevent us from connecting 
> > completely.
> > 
> > An exponential strategy could start at 10 seconds, then do 20, then 40, 
> > then 80, up to some limit.  That would reduce the extra load and also 
> > handle the (hopefully very rare) case where connections are taking a long 
> > time to connect.
> > 
> > best,
> > Colin
> > 
> 
>

Reply via email to