Re: Possible bug with haproxy 1.6.9/1.7.0: multiproc + resolvers cause DNS timeouts

Joshua M. Boniface Mon, 28 Nov 2016 23:49:25 -0800

Sorry here is my haproxy command information as well:

| [email protected] ~ $ sudo haproxy -vv 
| HA-Proxy version 1.7.0-1 2016/11/27
| Copyright 2000-2016 Willy Tarreau <[email protected]>
| 
| Build options :
|   TARGET  = linux2628
|   CPU     = generic
|   CC      = gcc 
|   CFLAGS  = -g -O2 -fPIE -fstack-protector-strong -Wformat 
-Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2
|   OPTIONS = USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1 
USE_NS=1
| 
| Default settings :
|   maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 
| 
| Encrypted password support via crypt(3): yes 
| Built with zlib version : 1.2.8
| Running on zlib version : 1.2.8
| Compression algorithms supported : identity("identity"), deflate("deflate"), 
raw-deflate("deflate"), gzip("gzip")
| Built with OpenSSL version : OpenSSL 1.0.2j  26 Sep 2016
| Running on OpenSSL version : OpenSSL 1.0.2j  26 Sep 2016
| OpenSSL library supports TLS extensions : yes 
| OpenSSL library supports SNI : yes 
| OpenSSL library supports prefer-server-ciphers : yes 
| Built with PCRE version : 8.35 2014-04-04
| Running on PCRE version : 8.35 2014-04-04
| PCRE library supports JIT : no (USE_PCRE_JIT not set)
| Built with Lua version : Lua 5.3.1
| Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT 
IP_FREEBIND
| Built with network namespace support
| 
| Available polling systems :
|       epoll : pref=300,  test result OK
|        poll : pref=200,  test result OK
|      select : pref=150,  test result OK
| Total: 3 (3 usable), will use epoll.
| 
| Available filters :
|         [COMP] compression
|         [TRACE] trace
|         [SPOE] spoe


Thanks,
Joshua M. Boniface
Linux System Ærchitect - Boniface Labs
Sigmentation fault: core dumped

On 29/11/16 02:17 AM, Joshua M. Boniface wrote:
> Hello list!
> 
> I believe I've found a bug in haproxy related to multiproc and a set of DNS 
> resolvers. What happens is, when combining these two features (multiproc and 
> dynamic resolvers), I get the following problem: the DNS resolvers, one per 
> process it seems, will fail intermittently and independently for no obvious 
> reason, and this triggers a DOWN event in the backend; a short time later, 
> the resolution succeeds and the backend goes back UP for a short time, before 
> repeating indefinitely. This bug also seems to have a curious effect of 
> causing the active record type to switch from A to AAAA and then back to A 
> repeatedly in a dual-stack setup, though the test below shows that this bug 
> occurs in an IPv4-only environment as well, and this failure is not 
> documented in my tests.
> 
> First, some background. I'm attempting to set up an haproxy instance with 
> multiple processes for SSL termination. At the same time, I'm also trying to 
> use a IPv6 backend managed by DNS, so I set up a "resolvers" section so I 
> could use resolved IPv6 addresses from AAAA records. As an aside, I've 
> noticed that haproxy will not start up if the only record for a host is an 
> AAAA record, reporting that the address can't be resolved. However since I 
> run a dual-stack [A + AAAA] record setup normally, this is not a huge deal to 
> me, though I think supporting an IPv6/AAAA-only backend should definitely be 
> a future goal!
> 
> First my config (figure 1). This is the most basic config I can construct 
> that triggers the bug while keeping most of my important settings; note the 
> host resolves to an A record only (figure 2); this record is provided by a 
> dnsmasq process which has read it out of /etc/hosts, so the resolution here 
> should be 100% stable.
> 
> (figure 1)
> | global
> |         log ::1:514 daemon debug
> |         log-send-hostname
> |         chroot /var/lib/haproxy
> |         pidfile /run/haproxy/haproxy.pid
> |         nbproc 2
> |         cpu-map 1 0
> |         cpu-map 2 1
> |         stats socket /var/lib/haproxy/admin-1.sock mode 660 level admin 
> process 1
> |         stats socket /var/lib/haproxy/admin-2.sock mode 660 level admin 
> process 2
> |         stats timeout 30s
> |         user haproxy
> |         group haproxy
> |         daemon
> |         maxconn 10000
> | resolvers dns
> |         nameserver dnsmasq 127.0.0.1:53
> |         resolve_retries 1
> |         hold valid 1s
> |         hold timeout 1s
> |         timeout retry 1s
> | defaults
> |         log     global
> |         option  http-keep-alive
> |         option  forwardfor except 127.0.0.0/8
> |         option  redispatch
> |         option  dontlognull
> |         option  forwardfor
> |         timeout connect 5s
> |         timeout client  24h
> |         timeout server  60m
> | listen back_deb-http
> |         bind 0.0.0.0:80
> |         mode http
> |         option httpchk GET /debian/haproxy
> |         server deb deb.domain.net:80/debian check inter 1000 resolvers dns
> 
> (figure 2)
> | [email protected] ~ $ dig @127.0.0.1 +short deb.domain.net               
>                                       
> | 10.9.0.13
> 
> When this config starts [using the command in (figure 3)], within seconds I 
> receive errors that the backend host has gone down due to a DNS resolution 
> failure. This continues indefinitely, however I stopped this test after one 
> failure/return cycle for each process (figure 4).
> 
> (figure 3)
> | [email protected] ~ $ sudo haproxy -f /etc/haproxy/haproxy.cfg; sudo 
> strace -tp $(pgrep haproxy | head -1) &>~/haproxy-1.strace & sudo strace -tp 
> $(pgrep haproxy | tail -1) &>~/haproxy-2.strace &
> 
> (figure 4)
> | Nov 29 01:46:54 elb2.domain.net haproxy[29887]: Proxy back_deb-http started.
> | Nov 29 01:46:58 elb2.domain.net haproxy[29888]: Server back_deb-http/deb is 
> going DOWN for maintenance (DNS timeout status). 0 active and 0 backup 
> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> | Nov 29 01:46:58 elb2.domain.net haproxy[29888]: proxy back_deb-http has no 
> server available!
> | Nov 29 01:46:59 elb2.domain.net haproxy[29888]: Server back_deb-http/deb 
> ('deb.domain.net') is UP/READY (resolves again).
> | Nov 29 01:46:59 elb2.domain.net haproxy[29888]: Server back_deb-http/deb 
> administratively READY thanks to valid DNS answer.
> | Nov 29 01:47:00 elb2.domain.net haproxy[29889]: Server back_deb-http/deb is 
> going DOWN for maintenance (DNS timeout status). 0 active and 0 backup 
> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
> | Nov 29 01:47:00 elb2.domain.net haproxy[29889]: proxy back_deb-http has no 
> server available!
> | Nov 29 01:47:02 elb2.domain.net haproxy[29889]: Server back_deb-http/deb 
> ('deb.domain.net') is UP/READY (resolves again).
> | Nov 29 01:47:02 elb2.domain.net haproxy[29889]: Server back_deb-http/deb 
> administratively READY thanks to valid DNS answer.
> 
> I've attached straces of both processes as shown in the command in (figure 3) 
> as well as a pcap file taken on my loopback interface of the DNS resolution 
> requests to dnsmask. The most curious thing I see in the DNS pcap is requests 
> for AAAA records that don't exist, but I don't know if that's the cause, 
> though this would explain the swapping-A-and-AAAA-records I mentioned earlier.
> 
> I've noticed this bug both on haproxy 1.6.9 (from the Debian jessie-backports 
> repo) and also on my own self-built 1.7.0 package as well, and all the above 
> testing was on 1.7.0. Please let me know if I can provide any further 
> information!
> 
> Thanks,
> Joshua M. Boniface
> Linux System Ærchitect - Boniface Labs
> Sigmentation fault: core dumped
>

Re: Possible bug with haproxy 1.6.9/1.7.0: multiproc + resolvers cause DNS timeouts

Reply via email to