Hi developers,

I’m proposing KIP-601 to support configurable socket connection timeout. 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-601%3A+Configurable+socket+connection+timeout
 
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-601:+Configurable+socket+connection+timeout>

Currently, the initial socket connection timeout is depending on system setting 
tcp_syn_retries. The actual timeout value is 2 ^ (tcp_sync_retries + 1) - 1 
seconds. For the reasons below, we want to control the client-side socket 
timeout directly using configuration files. 
        • The default value of tcp_syn_retries is 6. It means the default 
timeout value is 127 seconds and too long in some scenarios. For example, when 
the user specifies a list of N bootstrap-servers and no connection has been 
built between the client and the servers, the least loaded node provider will 
poll all the server nodes specified by the user. If M servers in the 
bootstrap-servers list are offline, the client may take (127 * M) seconds to 
connect to the cluster. In the worst case when M = N - 1, the wait time can be 
several minutes.
        • Though we may set the default value of tcp_syn_retries smaller, we 
will then change the system level network behaviors, which might cause other 
issues.
        • Applications depending on KafkaAdminClient may want to robustly know 
and control the initial socket connect timeout, which can help throw 
corresponding exceptions in their layer.

Please let me know if you have any thoughts or suggestions. Thanks.


Best, - Cheng Tan

Reply via email to