On Mon, 06 Mar 2017 01:35:19 -0500, Willy Tarreau <[email protected]> wrote:

On Fri, Mar 03, 2017 at 07:54:46PM +0300, Dmitry Sivachenko wrote:

> On 03 Mar 2017, at 19:36, David King <[email protected]> wrote:
>
> Thanks for the response!
> Thats interesting, i don't suppose you have the details of the other issues?


First report is
https://www.mail-archive.com/[email protected]/msg25060.html
Second one
https://www.mail-archive.com/[email protected]/msg25067.html

Thanks for the links Dmitry.

That's indeed really odd. If all hang at the same time, timing or uptime
looks like a good candidate. There's not much which is really specific
to FreeBSD in haproxy. However, the kqueue poller is only used there
(and on OpenBSD), and uses timing for the timeout. Thus it sounds likely
that there could be an issue there, either in haproxy or FreeBSD.

A hang every 2-3 months makes me think about the 49.7 days it takes for
a millisecond counter to wrap. These bugs are hard to troubleshoot. We
used to have such an issue a long time ago in linux 2.4 when the timer
was set to 100 Hz, it required 497 days to know whether the bug was
solved or not (obviously it now is).

I've just compared ev_epoll.c and ev_kqueue.c in case I could spot
anything obvious but from what I'm seeing they're pretty much similar
so I don't see what there could cause this bug. And since it apparently
works fine on FreeBSD 10, at best one of our bugs could only trigger a
system bug if it exists.

David, if your workload permits it, you can disable kqueue and haproxy
will automatically fall back to poll. For this you can simply put
"nokqueue" in the global section. poll() doesn't scale as well as
kqueue(), it's cheaper on low connection counts but it will use more
CPU above ~1000 concurrent connections.

Regards,
Willy


Hi Willy,

As for the timing issue, I can add to the discussion with a few related data points. In short, system uptime does not seem to be a commonality to my situation.

1) I had this issue affect 6 servers, spread across 5 data centers (only 2 servers are in the same facility.) All servers stopped processing requests at roughly the same moment, certainly within the same minute. All servers running FreeBSD 11.0-RELEASE-p2 with HAProxy compiled locally against OpenSSL-1.0.2k

2) System uptime was not at all similar across these servers, although chances are most servers HAProxy process start time would be similar. The servers with the highest system uptime were at about 27 days at the time of the incident, while the shortest were under a day or two.

3) HAProxy configurations are similar, but not exactly consistent between servers - different IPs on the frontend, different ACLs and backends.

4) The only synchronized application common to all of these servers is OpenNTPd.

5) I have since upgraded to HAProxy-1.7.3, same build process: the full version output is below - and will of course report any observed issues.

haproxy -vv
HA-Proxy version 1.7.3 2017/02/28
Copyright 2000-2017 Willy Tarreau <[email protected]>

Build options :
  TARGET  = freebsd
  CPU     = generic
  CC      = clang
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
  OPTIONS = USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built without compression support (neither USE_ZLIB nor USE_SLZ are set)
Compression algorithms supported : identity("identity")
Built with OpenSSL version : OpenSSL 1.0.2k  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k  26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.39 2016-06-14
Running on PCRE version : 8.39 2016-06-14
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY

Available polling systems :
     kqueue : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available filters :
        [SPOE] spoe
        [TRACE] trace
        [COMP] compression

Cheers,
-=Mark

Reply via email to