On Wed, Jun 12, 2019 at 12:08:03PM -0500, Dave Chiluk wrote:
> I did a bit more introspection on our TIME_WAIT connections.  The increase
> in sockets in TIME_WAIT is definitely from old connections to our backend
>  server instances.  Considering the fact that this server is doesn't
> actually serve real traffic we can make a reasonable assumptions that this
> is almost entirely due to increases in healthchecks.

Great!

> Doing an strace on haproxy 1.8.17 we see
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> sudo strace -e setsockopt,close -p 15743
> strace: Process 15743 attached
> setsockopt(17, SOL_TCP, TCP_NODELAY, [1], 4) = 0
> setsockopt(17, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
> close(17)                               = 0
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Doing the same strace on 1.9.8 we see
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
> sudo strace -e setsockopt,close -p 6670
> strace: Process 6670 attached
> setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
> close(4)                                = 0
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> The calls to setsockopt(17, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8)
> = 0
> appear to be missing.

Awesome, that's exactly the info I was missing. I suspected that for
whatever reason the lingering was not disabled, at least now we have
a proof of this! Now the trick is to figure why :-/

> We are running centos 7 with kernel 3.10.0-957.1.3.el7.x86_64.

OK, and with the setsockopt it should behave properly.

> I'll keep digging into this, and see if I can get stack traces that result
> in teh setsockopt calls on 1.8.17 so the stack can be more closely
> inspected.

Don't worry for this now, this is something we at least need to resolve
before issuing 2.0 or it will cause some trouble. Then we'll backport the
fix once the cause is figured out.

However when I try here I don't have the problem, either in 1.9.8 or 2.0-dev7 :

08:27:30.212570 connect(14, {sa_family=AF_INET, sin_port=htons(9003), 
sin_addr=inet_addr("127.0.0.1")}, 16) = 0
08:27:30.212590 recvfrom(14, NULL, 2147483647, 
MSG_TRUNC|MSG_DONTWAIT|MSG_NOSIGNAL, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
08:27:30.212610 setsockopt(14, SOL_SOCKET, SO_LINGER, {l_onoff=1, l_linger=0}, 
8) = 0
08:27:30.212630 close(14)               = 0
08:27:30.212659 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=0, 
tv_nsec=6993282}) = 0

So it must depend on the type of check. Could you please share a minimalistic
config that reproduces this ?

Thanks,
Willy

Reply via email to