Hi Vincent, > SigIgn is correct (SIGPIPE) is ignored. However, SigBlk seems > incorrect. HAProxy only blocks signals when dequeuing them. However, > no > signal is pending either, so they should be delivered? Maybe it was > bad luck? If you try again, does SigBlk become 0?
No matter how many times I send signals to this process, SigBlk remains as fffffffe7bfa7a26. I did find something that seems odd though when I was looking at the file descriptors that this PID has open. When I run lsof I see this (output filtered for brevity, excludes linked libraries): COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME haproxy 11537 haproxy 0u 0000 0,9 0 5350 anon_inode haproxy 11537 haproxy 4u unix 0xffff88023306d200 0t0 126464 /tmp/haproxy.sock.11536.tmp haproxy 11537 haproxy 5u IPv4 126465 0t0 TCP 192.168.10.15:http (LISTEN) haproxy 11537 haproxy 6u IPv4 126466 0t0 UDP *:58171 haproxy 11537 haproxy 7u IPv4 126467 0t0 TCP 192.168.200.100:http (LISTEN) haproxy 11537 haproxy 8u IPv4 126468 0t0 TCP 192.168.200.120:http (LISTEN) haproxy 11537 haproxy 9u IPv4 126469 0t0 TCP 192.168.200.110:http (LISTEN) The first odd thing is that /tmp/haproxy.sock.11536.tmp does not exist. The socket file that stats socket uses is defined as /tmp/haproxy.sock in haproxy.cfg. And it's either odd or a coincidence that the number in the filename is 1 digit lower than the PID. After running lsof I took a look at /proc/11537/fd ls -l /proc/11537/fd total 0 lrwx------ 1 root root 64 Oct 30 15:29 0 -> anon_inode:[eventpoll] lrwx------ 1 root root 64 Oct 30 15:29 4 -> socket:[126464] lrwx------ 1 root root 64 Oct 30 15:29 5 -> socket:[126465] lrwx------ 1 root root 64 Oct 30 15:29 6 -> socket:[126466] lrwx------ 1 root root 64 Oct 30 15:29 7 -> socket:[126467] lrwx------ 1 root root 64 Oct 30 15:29 8 -> socket:[126468] lrwx------ 1 root root 64 Oct 30 15:29 9 -> socket:[126469] It won't show here but all of those symlinks are non-existent (flashing red). And each one of those symlinks corresponds to FD listed in the output of lsof that shows haproxy actively listening on those sockets. (?) And if I connect to any of those IPs:ports they pass traffic. In looking at tcp_bind_listener() in proto_tcp.c (comment on line 782) it indicates that the desired behavior is to reuse file descriptors instead of creating a new socket. Would these orphaned file descriptors indicate that a new socket was created instead of the file descriptors being reused? I'm wondering if that is the case due to the behavior I saw with kernel 2.6 and the "cannot bind socket" message on reloads and that due to the SO_REUSEPORT in kernel 3.9 and later, additional processes are allowed to bind to these same IPs:ports and that may be masking the issue on 3.9+ kernels. The other thing I'm wondering is if signal_unregister_handler() in sig_soft_stop in haproxy.c has removed all of the signal handlers that this PID would otherwise be listening for and that's why it is unresponsive to anything other than -SIGKILL. soft_stop() is called right before signal_unregister_handler() so is it possible that something went sideways while executing soft_stop(), leaving this PID sort of in limbo? Regards, Chris On Fri, Oct 30, 2015 at 3:28 PM, Vincent Bernat <[email protected]> wrote: > ❦ 30 octobre 2015 15:14 -0400, Chris Riley <[email protected]> : > > > SigQ: 3/63840 > > SigPnd: 0000000000000000 > > SigBlk: fffffffe7bfa7a26 > > SigIgn: 0000000000001000 > > SigCgt: 0000000180300205 > > SigIgn is correct (SIGPIPE) is ignored. However, SigBlk seems > incorrect. HAProxy only blocks signals when dequeuing them. However, no > signal is pending either, so they should be delivered? Maybe it was bad > luck? If you try again, does SigBlk become 0? > -- > Don't stop with your first draft. > - The Elements of Programming Style (Kernighan & Plauger) >

