On Wed, May 30, 2018 at 04:47:31PM +0200, William Dauchy wrote: > Hello William L., >
Hi William D. :-) > I did some more testing: > I simplified my config, removing the multi binding part and cpu-map. > Conclusion is, I have this issue when I activate nbthread feature > (meaning no probkem without). > > I tried to kill -USR1 the failing worker, but it remains. > > Here are the Sig* from status file of one of the failing process: > SigQ: 0/192448 > SigPnd: 0000000000000000 > SigBlk: 0000000000000800 > SigIgn: 0000000000001800 > SigCgt: 0000000180300205 > I can reproduce the same situation there, however I disabled the seamless reload. When doing a -USR1 & strace on an remaining worker, I can see that the the signal is not blocked, and that it's still polling My guess is that something is preventing the leaving of the worker. It tried to gdb the threads but not one seems to be in a dead lock. I have to investigate more. I'm not sure that's related at all with the timing of the reload but I could be wrong. > About the timing of reload, it seems to take a few seconds most of the > time, so I *think* I am not reloading before another is not yet done, > but I would appreciate whether I can check this fact through a file > before sending the reload; do you have any hint? I think systemd is not trying to reload when a reload is not finished yet with Type=notify. You could 'grep reloading' on the systemctl status haproxy to check that. Unfortunately the only way to know when the service is ready is with systemd, but I planned to make the status available on the stats socket in the future. -- William Lallemand