Hi Vincent,

> SigIgn is correct (SIGPIPE) is ignored. However, SigBlk seems
> incorrect. HAProxy only blocks signals when dequeuing them. However, > no 
> signal
is pending either, so they should be delivered? Maybe it was
> bad luck? If you try again, does SigBlk become 0?

No matter how many times I send signals to this process, SigBlk remains as
fffffffe7bfa7a26.

I did find something that seems odd though when I was looking at the file
descriptors that this PID has open.

When I run lsof I see this (output filtered for brevity, excludes linked
libraries):

COMMAND     PID    USER   FD      TYPE             DEVICE SIZE/OFF
NODE NAME
haproxy   11537 haproxy    0u     0000                0,9        0
5350 anon_inode
haproxy   11537 haproxy    4u     unix 0xffff88023306d200      0t0
126464 /tmp/haproxy.sock.11536.tmp
haproxy   11537 haproxy    5u     IPv4             126465      0t0
 TCP 192.168.10.15:http (LISTEN)
haproxy   11537 haproxy    6u     IPv4             126466      0t0
 UDP *:58171
haproxy   11537 haproxy    7u     IPv4             126467      0t0
 TCP 192.168.200.100:http (LISTEN)
haproxy   11537 haproxy    8u     IPv4             126468      0t0
 TCP 192.168.200.120:http (LISTEN)
haproxy   11537 haproxy    9u     IPv4             126469      0t0
 TCP 192.168.200.110:http (LISTEN)

The first odd thing is that /tmp/haproxy.sock.11536.tmp does not exist. The
socket file that stats socket uses is defined as /tmp/haproxy.sock in
haproxy.cfg. And it's either odd or a coincidence that the number in the
filename is 1 digit lower than the PID.

After running lsof I took a look at /proc/11537/fd

ls -l /proc/11537/fd
total 0
lrwx------ 1 root root 64 Oct 30 15:29 0 -> anon_inode:[eventpoll]
lrwx------ 1 root root 64 Oct 30 15:29 4 -> socket:[126464]
lrwx------ 1 root root 64 Oct 30 15:29 5 -> socket:[126465]
lrwx------ 1 root root 64 Oct 30 15:29 6 -> socket:[126466]
lrwx------ 1 root root 64 Oct 30 15:29 7 -> socket:[126467]
lrwx------ 1 root root 64 Oct 30 15:29 8 -> socket:[126468]
lrwx------ 1 root root 64 Oct 30 15:29 9 -> socket:[126469]

It won't show here but all of those symlinks are non-existent (flashing
red). And each one of those symlinks corresponds to FD listed in the output
of lsof that shows haproxy actively listening on those sockets. (?) And if
I connect to any of those IPs:ports they pass traffic.

In looking at tcp_bind_listener() in proto_tcp.c (comment on line 782) it
indicates that the desired behavior is to reuse file descriptors instead of
creating a new socket. Would these orphaned file descriptors indicate that
a new socket was created instead of the file descriptors being reused? I'm
wondering if that is the case due to the behavior I saw with kernel 2.6 and
the "cannot bind socket" message on reloads and that due to the
SO_REUSEPORT in kernel 3.9 and later, additional processes are allowed to
bind to these same IPs:ports and that may be masking the issue on 3.9+
kernels.

The other thing I'm wondering is if signal_unregister_handler()
in sig_soft_stop in haproxy.c has removed all of the signal handlers that
this PID would otherwise be listening for and that's why it is unresponsive
to anything other than -SIGKILL. soft_stop() is called right
before signal_unregister_handler() so is it possible that something went
sideways while executing soft_stop(), leaving this PID sort of in limbo?

Regards,
Chris


On Fri, Oct 30, 2015 at 3:28 PM, Vincent Bernat <[email protected]> wrote:

>  ❦ 30 octobre 2015 15:14 -0400, Chris Riley <[email protected]> :
>
> > SigQ: 3/63840
> > SigPnd: 0000000000000000
> > SigBlk: fffffffe7bfa7a26
> > SigIgn: 0000000000001000
> > SigCgt: 0000000180300205
>
> SigIgn is correct (SIGPIPE) is ignored. However, SigBlk seems
> incorrect. HAProxy only blocks signals when dequeuing them. However, no
> signal is pending either, so they should be delivered? Maybe it was bad
> luck? If you try again, does SigBlk become 0?
> --
> Don't stop with your first draft.
>             - The Elements of Programming Style (Kernighan & Plauger)
>

Reply via email to