On Mon, 06 Mar 2017 01:35:19 -0500, Willy Tarreau <[email protected]> wrote:
On Fri, Mar 03, 2017 at 07:54:46PM +0300, Dmitry Sivachenko wrote:
> On 03 Mar 2017, at 19:36, David King <[email protected]>
wrote:
>
> Thanks for the response!
> Thats interesting, i don't suppose you have the details of the other
issues?
First report is
https://www.mail-archive.com/[email protected]/msg25060.html
Second one
https://www.mail-archive.com/[email protected]/msg25067.html
Thanks for the links Dmitry.
That's indeed really odd. If all hang at the same time, timing or uptime
looks like a good candidate. There's not much which is really specific
to FreeBSD in haproxy. However, the kqueue poller is only used there
(and on OpenBSD), and uses timing for the timeout. Thus it sounds likely
that there could be an issue there, either in haproxy or FreeBSD.
A hang every 2-3 months makes me think about the 49.7 days it takes for
a millisecond counter to wrap. These bugs are hard to troubleshoot. We
used to have such an issue a long time ago in linux 2.4 when the timer
was set to 100 Hz, it required 497 days to know whether the bug was
solved or not (obviously it now is).
I've just compared ev_epoll.c and ev_kqueue.c in case I could spot
anything obvious but from what I'm seeing they're pretty much similar
so I don't see what there could cause this bug. And since it apparently
works fine on FreeBSD 10, at best one of our bugs could only trigger a
system bug if it exists.
David, if your workload permits it, you can disable kqueue and haproxy
will automatically fall back to poll. For this you can simply put
"nokqueue" in the global section. poll() doesn't scale as well as
kqueue(), it's cheaper on low connection counts but it will use more
CPU above ~1000 concurrent connections.
Regards,
Willy
Hi Willy,
As for the timing issue, I can add to the discussion with a few related
data points. In short, system uptime does not seem to be a commonality to
my situation.
1) I had this issue affect 6 servers, spread across 5 data centers (only 2
servers are in the same facility.) All servers stopped processing
requests at roughly the same moment, certainly within the same minute.
All servers running FreeBSD 11.0-RELEASE-p2 with HAProxy compiled locally
against OpenSSL-1.0.2k
2) System uptime was not at all similar across these servers, although
chances are most servers HAProxy process start time would be similar. The
servers with the highest system uptime were at about 27 days at the time
of the incident, while the shortest were under a day or two.
3) HAProxy configurations are similar, but not exactly consistent between
servers - different IPs on the frontend, different ACLs and backends.
4) The only synchronized application common to all of these servers is
OpenNTPd.
5) I have since upgraded to HAProxy-1.7.3, same build process: the full
version output is below - and will of course report any observed issues.
haproxy -vv
HA-Proxy version 1.7.3 2017/02/28
Copyright 2000-2017 Willy Tarreau <[email protected]>
Build options :
TARGET = freebsd
CPU = generic
CC = clang
CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
OPTIONS = USE_OPENSSL=1 USE_PCRE=1
Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Encrypted password support via crypt(3): yes
Built without compression support (neither USE_ZLIB nor USE_SLZ are set)
Compression algorithms supported : identity("identity")
Built with OpenSSL version : OpenSSL 1.0.2k 26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k 26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.39 2016-06-14
Running on PCRE version : 8.39 2016-06-14
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
Available polling systems :
kqueue : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use kqueue.
Available filters :
[SPOE] spoe
[TRACE] trace
[COMP] compression
Cheers,
-=Mark