On Tue, Apr 19, 2016 at 04:15:08PM +0200, Willy Tarreau wrote:
> On Tue, Apr 19, 2016 at 02:54:35PM +0200, Lukas Tribus wrote:
> > >We use haproxy 1.6.3 (latest CentOS 6.7) and experience similar situation
> > >after some reloads (-sf). The old haproxy process does not exit and uses
> > >100% cpu, strace showing:
> > >epoll_wait(0, {}, 200, 0)               = 0
> > >epoll_wait(0, {}, 200, 0)               = 0
> > >epoll_wait(0, {}, 200, 0)               = 0
> > >epoll_wait(0, {}, 200, 0)               = 0
> > >epoll_wait(0, {}, 200, 0)               = 0
> > >epoll_wait(0, {}, 200, 0)               = 0
> > >
> > >In our case, it was a tcp backend tunnelling rsyslog messages. After
> > >restarting local rsyslogd, the load was gone and old haproxy instance
> > >exited. It's hard to tell how many reloads it takes to make haproxy go
> > >crazy or what is the exact reproducible test. But it does not take
> > >hundreds of restart, rather 10-20 (our reloads are not very frequent) to
> > >make haproxy go crazy.
> > 
> > Also matches this report from December:
> > https://www.mail-archive.com/haproxy@formilux.org/msg20772.html
> 
> Yep very likely. The combination of the two reports is very intriguing.
> The first one shows the signals being blocked, while the only place where
> we block them is in __signal_process_queue() only while calling the handlers
> or performing the wakeup() calls, both of which should be instantaneous,
> and more importantly the function cannot return without unblocking the
> signals.
> 
> I still have no idea what is going on, the code looks simple and clear,
> and certainly not compatible with such behaviours. I'm still digging.

OK in fact it's different. Above we have a busy polling loop, which may
very be caused by the buffer space miscalculation bug and which results
in a process not completing its job until a timeout strikes. The link to
the other report shows a normal polling with blocked signals.

Willy


Reply via email to