Hi! There seems to be some kind of problem when backend servers (in this case ELBs) change their IP addresses.
At some point, apparently, the ELB behind the DNS name in my config changed it address(es). Lots of haproxys we use as sidecars on our application servers failed their health checks afterwards with L4 timeouts. For testing, I reloaded haproxy on one of them, and the error went away. The resolvers section has two servers: a local dnsmasq and the AWS VPC DNS server at the "magic" address 169.254.19.253. On a different instance I captured some traffic. The pcap shows the DNS queries and responses for the backend server name going to both 127.0.0.1:53 and 169.254.169.253:53. Both servers reply with the same answers, carrying the current IPs. Those are the same as shown by dig in a shell. (10.205.100.120 and 10.205.100.61). However, haproxy apparently still uses and old address 10.205.100.53 that the ELB probably had at some point -- hard to tell after the fact. In the pcap I can "ICMP Host Unreachable" responses for attempts to connect to 10.205.100.53 on all the ports my backends specify. At first I suspected length issues, but the responses are just 174 bytes long. If needed, I can provide the pcap privately. This can be a real fun-killer when all the sidecars suddenly lose connection across tens of VMs... Am I missing something in my config, or is this an actual (maybe known?) bug? Configuration, dig output, and version info below. Kind regards, Daniel dig output (actual name is a little longer, I cut off the name for brevity and privacy). --------- [aws:staging-staging] root:~# dig @169.254.169.253 loadbalancer-internal.private ; <<>> DiG 9.9.5-3ubuntu0.17-Ubuntu <<>> @169.254.169.253 loadbalancer-internal.private ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 501 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;loadbalancer-internal.private. IN A ;; ANSWER SECTION: loadbalancer-internal.private. 11 IN A 10.205.100.120 loadbalancer-internal.private. 11 IN A 10.205.100.61 ;; Query time: 0 msec ;; SERVER: 169.254.169.253#53(169.254.169.253) ;; WHEN: Mon Aug 27 15:51:48 CEST 2018 ;; MSG SIZE rcvd: 141 [aws:staging-staging] root:~# dig @127.0.0.1 loadbalancer-internal.private ; <<>> DiG 9.9.5-3ubuntu0.17-Ubuntu <<>> @127.0.0.1 loadbalancer-internal.private ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20706 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;loadbalancer-internal.private. IN A ;; ANSWER SECTION: loadbalancer-internal.private. 5 IN A 10.205.100.61 loadbalancer-internal.private. 5 IN A 10.205.100.120 ;; Query time: 0 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Mon Aug 27 15:51:54 CEST 2018 ;; MSG SIZE rcvd: 130 --------- haproxy.cfg (there are more proxies in the real thing, but they are all the the same, just for different ports): --------------------- global log /dev/log len 350 local1 info log-tag haproxy stats socket /var/run/haproxy.stat user haproxy group haproxy mode 600 level admin chroot /var/lib/haproxy user haproxy group haproxy hard-stop-after 30s defaults mode tcp log global option tcplog option dontlognull option http-keep-alive timeout http-request 10s timeout queue 1m timeout connect 5s timeout client 2m timeout server 2m timeout http-keep-alive 10s timeout check 5s retries 3 maxconn 2000 resolvers default nameserver local 127.0.0.1:53 nameserver aws 169.254.169.253:53 listen rabbitmq bind 127.0.0.1:5671 option dontlog-normal server lb-internal loadbalancer-internal.private:5671 resolvers default check addr loadbalancer-internal.private port 5671 --------------------- Log: -------------------- ... Aug 27 16:49:09 xxx haproxy[2090]: 127.0.0.1:35891 [27/Aug/2018:16:49:09.031] rabbitmq rabbitmq/<NOSRV> -1/-1/0 0 SC 0/0/0/0/0 0/0 ... -------------------- Version info: --------------------- [aws:staging-staging] root:~# haproxy -vvv HA-Proxy version 1.7.11-1ppa1~trusty 2018/04/30 Copyright 2000-2018 Willy Tarreau <wi...@haproxy.org> Build options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_NS=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Encrypted password support via crypt(3): yes Built with zlib version : 1.2.8 Running on zlib version : 1.2.8 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014 Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 8.31 2012-07-06 Running on PCRE version : 8.31 2012-07-06 PCRE library supports JIT : no (libpcre build without JIT?) Built with Lua version : Lua 5.3.1 Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Built with network namespace support Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Available filters : [COMP] compression [TRACE] trace [SPOE] spoe --------------------- -- Daniel Schneller Principal Cloud Engineer CenterDevice GmbH Rheinwerkallee 3 53227 Bonn www.centerdevice.com __________________________________________ Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina, Michael Rosbach, Handelsregister-Nr.: HRB 18655, HR-Gericht: Bonn, USt-IdNr.: DE-815299431 Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist nicht gestattet.
signature.asc
Description: Message signed with OpenPGP