On Fri, Mar 03, 2017 at 07:54:46PM +0300, Dmitry Sivachenko wrote:
> 
> > On 03 Mar 2017, at 19:36, David King <[email protected]> wrote:
> > 
> > Thanks for the response!
> > Thats interesting, i don't suppose you have the details of the other issues?
> 
> 
> First report is 
> https://www.mail-archive.com/[email protected]/msg25060.html
> Second one
> https://www.mail-archive.com/[email protected]/msg25067.html

Thanks for the links Dmitry.

That's indeed really odd. If all hang at the same time, timing or uptime
looks like a good candidate. There's not much which is really specific
to FreeBSD in haproxy. However, the kqueue poller is only used there
(and on OpenBSD), and uses timing for the timeout. Thus it sounds likely
that there could be an issue there, either in haproxy or FreeBSD.

A hang every 2-3 months makes me think about the 49.7 days it takes for
a millisecond counter to wrap. These bugs are hard to troubleshoot. We
used to have such an issue a long time ago in linux 2.4 when the timer
was set to 100 Hz, it required 497 days to know whether the bug was
solved or not (obviously it now is).

I've just compared ev_epoll.c and ev_kqueue.c in case I could spot
anything obvious but from what I'm seeing they're pretty much similar
so I don't see what there could cause this bug. And since it apparently
works fine on FreeBSD 10, at best one of our bugs could only trigger a
system bug if it exists.

David, if your workload permits it, you can disable kqueue and haproxy
will automatically fall back to poll. For this you can simply put
"nokqueue" in the global section. poll() doesn't scale as well as
kqueue(), it's cheaper on low connection counts but it will use more
CPU above ~1000 concurrent connections.

Regards,
Willy

Reply via email to