addisonj commented on pull request #14841:
URL: https://github.com/apache/pulsar/pull/14841#issuecomment-1077726407


   Thanks for starting this discussion Lari.
   
   In the bookkeeper protocol, we *do* rely on TCP level keep-alive as opposed 
to application level keep-alive... and by all accounts, we should just switch 
to an application level keep-alive instead.
   
   AFAICT, event if we were to enable TCP keep-alive, it likely is not going to 
have much of an impact because of the OS level defaults being what they are, 
which, imho, basically make it useless. See this link 
https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html for details, but 
essentially the default OS level settings are such that:
   
   * it only starts after a connection has already been active for 2 hours
   * it only pings every 75 seconds
   * it takes 9 failures to trigger a failed socket
   
   In summary it can take upwards of 11 minutes to time out a socket but only 
for connections active after 2 hours.
   
   So this, combined with the application level keep alive, I think makes it 
not that useful by default...
   
   All that said I am not against making it optional. The cost of TCP 
keep-alive is near zero, but for "power users" it can provide a redundant 
system against zombie sockets, though I suppose it is possible for some new 
bugs to surface if we get socket hang ups in other code paths, where right now 
a dead socket is most likely to be closed by the heartbeat failing.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to