I'm using service_loadbalancer from kubernetes (
https://github.com/kubernetes/contrib/tree/master/service-loadbalancer ) .
This program would re-spawn haproxy when it found a change of upstream
endpoints.
When service_loadbalancer starts, it runs haproxy -sf $(cat pidfile)
several times very quickly, and on that moment, haproxy -sf doesn't take
effect, there were many haproxy processes left.

I ran strace on the new haproxy -sf , and ensured it did send a SIGUSR1 to
the old process indicated by pidfile.
I tried to send SIGUSR1, SIGTERM, SIGINT to the left haproxy processes
while strace-ing on them, but they did not get the signals. I checkec
/proc/PID/status file, the SigBlk line is fffffffc7bfa7a27. SIGKILL can
kill the old process.

I searched SIG_SETMASK in the source code, and found it in signal.c file.
Maybe you should remove SIGUSR1 from blocked_sig set in signal_init()
function?


inspected old process with gdb:
(gdb) where
#0 0x00007f42b2916763 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1 <https://github.com/haproxy/haproxy/issues/1> 0x00000000004d47db in
_do_poll (p=0x73a5e0 , exp=1633798236) at src/ev_epoll.c:125
#2 <https://github.com/haproxy/haproxy/issues/2> 0x000000000040a8ef in
run_poll_loop () at src/haproxy.c:1576
#3 <https://github.com/haproxy/haproxy/pull/3> 0x000000000040b438 in main
(argc=8, argv=0x7fff4168bd08) at src/haproxy.c:1912
(gdb) print jobs
$1 = 4
(gdb) print stopping
$2 = 0

ls -l /proc/46/fd/
total 0
lrwx------ 1 root root 64 Dec 2 06:00 0 -> anon_inode:[eventpoll]
lrwx------ 1 root root 64 Dec 2 06:00 4 -> socket:[33335699]
lrwx------ 1 root root 64 Dec 2 06:00 5 -> socket:[33335700]
lrwx------ 1 root root 64 Dec 2 06:00 6 -> socket:[33335701]
lrwx------ 1 root root 64 Dec 2 06:00 7 -> socket:[33335702]

the process have four sockets, just same as the jobs variable

lsof -p 46 | grep 3333
haproxy 46 root 4u unix 0xffff88102014e900 0t0 33335699 /tmp/haproxy.45.tmp
haproxy 46 root 5u IPv4 33335700 0t0 TCP *:jetcmeserver (LISTEN)
haproxy 46 root 6u IPv4 33335701 0t0 TCP *:http (LISTEN)
haproxy 46 root 7u IPv4 33335702 0t0 TCP *:mysql (LISTEN)

inspected old process with strace:
Process 46 attached
epoll_wait(0, {}, 200, 1000) = 0
epoll_wait(0, {}, 200, 1000) = 0
epoll_wait(0, {}, 200, 1000) = 0
epoll_wait(0, {}, 200, 1000) = 0
......

I don't know when did it block SIGUSR1 .

Reply via email to