Re: [DISCUSS] KIP-601: Configurable socket connection timeout

Colin McCabe Tue, 28 Apr 2020 14:58:23 -0700

Hi Cheng,

Thanks for the KIP.

> Currently, the initial socket connection timeout is depending on system 
> setting tcp_syn_retries. The actual timeout value is 2 ^ (tcp_sync_retries + 
> 1) - 1 seconds

This section is confusing since it refers to Linux configuration settings 
without mentioning identifying them as such.  I wondered for a second if Kafka 
had a tcp_syn_retries setting (it doesn't.)  You should make it more clear when 
you are referring to an operating-system-specific detail.

> We propose a new common client config: connections.timeout.ms

The name "connections.timeout.ms" seems unclear.  If I had to guess about what 
"connections.timeout.ms" was, I would probably guess that it was some kind of 
idle timeout on the connection as a whole.  I wouldn't think that it only 
applied to connection setup.  How about something like 
"connection.setup.timeout.ms"?

> Currently, when no nodes provided in --boostrap-server option is
> connected, the LeastLoadedNodeProvider will provide an unconnected
> node for the client. The Cluster class shuffled the nodes to balance the
> initial pressure and the LeastLoadedNodeProvider will always provide
> the same node, which is the last node after shuffling. Consequently, 
> though we may provide several bootstrap servers, the client not be able 
> to connect to the cluster if any of the servers in the --bootstrap-server 
> list 
> is offline.

Hmm... I can't follow what this paragraph is trying to say.  What's the 
connection between LeastLoadedProvider and TCP connection setup timeouts?

> ClusterConnectionStates will keep the index of the most recently 
> provided node. Meanwhile, a public API looks like below will be added 
> to simulate the round-robin node picking.

Again, this seems unrelated to the problem described in the motivation section.

I would really suggest creating a separate KIP for this, since it seems 
unrelated to the problem of TCP timeouts.  When KIPs try to do too many things 
at once, they often lose focus and get delayed.

> To Discuss
> 1. We can think about if we want to have different 
> connect.timeout.ms for different clients.
> 2. What would be the default value for connect.timeout.ms (is 10s
> too short?)
> 3. Should the Selector / IdleExpiryManager throw an exception 
> when the initial socket channel connection is timeout? Personally 
> I don't think we need as we will continue trying the rest of the 
> candidate nodes.

It's good to get feedback, but in general everything in a KIP is "To discuss," 
so we should not have a separate section titled "To discuss."  :) 

We also want the KIP to reflect what we actually decided on and implemented, so 
it shouldn't include open questions (unless they are questions about future 
work that is not covered in the scope of the KIP).

To come back to the timeout question, I think you should motivate your decision 
(and it does have to be decided here, not left as an open question), by 
explaining its relationship to other timeouts, and to the underlying TCP 
three-way handshake which it is putting an upper bound on.  10 seconds doesn't 
seem unreasonable in that context.

I believe that the default value of 127 seconds in Linux is set as high as it 
is to compensate for the fact that many applications do not retry connections 
if the initial establishment fails.  That isn't the case for Kafka, which 
motivates a lower default for us than for them.

Question 3 seems to be about whether a connection setup failure should result 
in the node getting moved to an error state (and hence not getting picked a 
second time by LeastLoadedNode, for example).  It seems pretty clear that a 
connection setup error is an error, right?  (Also, phrasing this as "should we 
throw an exception" is unnecessarily implementation specific.)

For rejected alternatives, we should explain why the KIP introduces a new 
timeout configuration rather than terminating an unsuccessful TCP connection 
setup attempt after request.timeout.ms has elapsed.

Also, the "Compatibility, Deprecation, and Migration Plan" should say that 
there is no impact (the section is just blank now)

Thanks again for the KIP.  This seems like it has been a gap in Kafka's error 
handling for a while, and it's good to see a proposal to fix it.

best,
Colin

On Mon, Apr 27, 2020, at 18:48, Cheng Tan wrote:
> Hi developers,
> 
> 
> I’m proposing KIP-601 to support configurable socket connection 
> timeout. 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-601%3A+Configurable+socket+connection+timeout
>  
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-601:+Configurable+socket+connection+timeout>
> 
> Currently, the initial socket connection timeout is depending on system 
> setting tcp_syn_retries. The actual timeout value is 2 ^ 
> (tcp_sync_retries + 1) - 1 seconds. For the reasons below, we want to 
> control the client-side socket timeout directly using configuration 
> files. 
>       • The default value of tcp_syn_retries is 6. It means the default 
> timeout value is 127 seconds and too long in some scenarios. For 
> example, when the user specifies a list of N bootstrap-servers and no 
> connection has been built between the client and the servers, the least 
> loaded node provider will poll all the server nodes specified by the 
> user. If M servers in the bootstrap-servers list are offline, the 
> client may take (127 * M) seconds to connect to the cluster. In the 
> worst case when M = N - 1, the wait time can be several minutes.
>       • Though we may set the default value of tcp_syn_retries smaller, we 
> will then change the system level network behaviors, which might cause 
> other issues.
>       • Applications depending on KafkaAdminClient may want to robustly know 
> and control the initial socket connect timeout, which can help throw 
> corresponding exceptions in their layer.
> 
> Please let me know if you have any thoughts or suggestions. Thanks.
> 
> 
> Best, - Cheng Tan
> 
>

Re: [DISCUSS] KIP-601: Configurable socket connection timeout

Reply via email to