Thank you all for your advice. I dismissed KA early on for the wrong
reasons. I thought there must be something better available that I
missed. I'll go with keep alive.

2016-11-29 13:16 GMT+01:00 Greg Young <gregoryyou...@gmail.com>:
> In my experience protocol level tcp keep alives don't always work
> between implementations. BSD - windows used to be a primary culprit,
> though they were set they would not get hit in some cases. Things may
> be better today. On same implementation they should work quite well.
> Definitely worth testing if you deal with multiple implementations.
>
> On Tue, Nov 29, 2016 at 12:00 PM, Justin Mason <j...@jmason.org> wrote:
>> I think that, as the Zalando blog post suggested, you could use OS-level TCP
>> keepalive to test the connections regularly, so the kernel will eventually
>> notice that the TCP connection is now dead:
>> http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html -- by default
>> this waits for 2 hours of inactivity, which seems too long for many use
>> cases.
>>
>> I generally prefer to perform app-level keepalives with app-controlled
>> timeouts and retry settings, but in this case if it's legacy code, a
>> kernel-level sysctl tweak may be more palatable!
>>
>> --j.
>>
>> On Tue, 29 Nov 2016 at 09:48 Alen Vrečko <alen.vre...@gmail.com> wrote:
>>>
>>> No. It is just a typical "off the shelf" Linux setup. Thanks for the
>>> insight.
>>>
>>> 2016-11-29 10:35 GMT+01:00 Wojciech Kudla <wojciech.ku...@gmail.com>:
>>> > Any chance that socket connection is handled by some sort of kernel
>>> > bypass?
>>> > All bets with blocking IO are off when running with onload/offload
>>> > drivers.
>>> >
>>> >
>>> > On Tue, 29 Nov 2016, 09:29 Alen Vrečko, <alen.vre...@gmail.com> wrote:
>>> >>
>>> >> Got a situation where thread hanged on socket read (old school socket
>>> >> bio code). One side was in TCP established while the other in
>>> >> fin_wait_2. The customer was "upgrading" the switches at the time this
>>> >> happened.
>>> >>
>>> >> The thread will never complete. It should get a timeout exception. But
>>> >> it doesn't. There is the call to Socket#setSoTimeout in the code. It
>>> >> should do the job. My first though was there must be a bug in
>>> >> setSoTimeout. I never had much faith in SoTimeout. Was not surprised
>>> >> to find a lot of bug reports related to socketRead0 hangs. Reminded me
>>> >> of this blog post about hanged postgres connection [1].
>>> >>
>>> >> I'd use nio and app level timeouts. But it is legacy code that I
>>> >> can't/don't want to touch.
>>> >>
>>> >> Been thinking of using a custom SocketFactory that wraps the sockets
>>> >> with some monitoring code. Pretty ugly. It doesn't feel right.
>>> >>
>>> >> Found quite a few discussions about this. But not really any solutions
>>> >> that don't require app level changes.
>>> >>
>>> >> Any thoughts? Anybody in a similar boat?
>>> >>
>>> >> [1] https://tech.zalando.com/blog/hack-to-terminate-tcp-conn-postgres/
>>> >>
>>> >> --
>>> >> You received this message because you are subscribed to the Google
>>> >> Groups
>>> >> "mechanical-sympathy" group.
>>> >> To unsubscribe from this group and stop receiving emails from it, send
>>> >> an
>>> >> email to mechanical-sympathy+unsubscr...@googlegroups.com.
>>> >> For more options, visit https://groups.google.com/d/optout.
>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google
>>> > Groups
>>> > "mechanical-sympathy" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send
>>> > an
>>> > email to mechanical-sympathy+unsubscr...@googlegroups.com.
>>> > For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "mechanical-sympathy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to mechanical-sympathy+unsubscr...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "mechanical-sympathy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to mechanical-sympathy+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
> Studying for the Turing test
>
> --
> You received this message because you are subscribed to the Google Groups 
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to mechanical-sympathy+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to