On Tue, Apr 19, 2016 at 04:15:08PM +0200, Willy Tarreau wrote: > On Tue, Apr 19, 2016 at 02:54:35PM +0200, Lukas Tribus wrote: > > >We use haproxy 1.6.3 (latest CentOS 6.7) and experience similar situation > > >after some reloads (-sf). The old haproxy process does not exit and uses > > >100% cpu, strace showing: > > >epoll_wait(0, {}, 200, 0) = 0 > > >epoll_wait(0, {}, 200, 0) = 0 > > >epoll_wait(0, {}, 200, 0) = 0 > > >epoll_wait(0, {}, 200, 0) = 0 > > >epoll_wait(0, {}, 200, 0) = 0 > > >epoll_wait(0, {}, 200, 0) = 0 > > > > > >In our case, it was a tcp backend tunnelling rsyslog messages. After > > >restarting local rsyslogd, the load was gone and old haproxy instance > > >exited. It's hard to tell how many reloads it takes to make haproxy go > > >crazy or what is the exact reproducible test. But it does not take > > >hundreds of restart, rather 10-20 (our reloads are not very frequent) to > > >make haproxy go crazy. > > > > Also matches this report from December: > > https://www.mail-archive.com/haproxy@formilux.org/msg20772.html > > Yep very likely. The combination of the two reports is very intriguing. > The first one shows the signals being blocked, while the only place where > we block them is in __signal_process_queue() only while calling the handlers > or performing the wakeup() calls, both of which should be instantaneous, > and more importantly the function cannot return without unblocking the > signals. > > I still have no idea what is going on, the code looks simple and clear, > and certainly not compatible with such behaviours. I'm still digging.
OK in fact it's different. Above we have a busy polling loop, which may very be caused by the buffer space miscalculation bug and which results in a process not completing its job until a timeout strikes. The link to the other report shows a normal polling with blocked signals. Willy