Hi Daniel,

On Tue, Aug 28, 2018 at 1:46 AM Daniel Schneller <
daniel.schnel...@centerdevice.com> wrote:

> Hi!
>
> There seems to be some kind of problem when backend servers (in this case
> ELBs) change their IP
> addresses.
>
> At some point, apparently, the ELB behind the DNS name in my config
> changed it address(es).
> Lots of haproxys we use as sidecars on our application servers failed
> their health checks afterwards with
> L4 timeouts. For testing, I reloaded haproxy on one of them, and the error
> went away.
>
> The resolvers section has two servers: a local dnsmasq and the AWS VPC DNS
> server at the "magic"
> address 169.254.19.253.
>
> On a different instance I captured some traffic. The pcap shows the DNS
> queries and responses
> for the backend server name going to both 127.0.0.1:53 and
> 169.254.169.253:53. Both servers
> reply with the same answers, carrying the current IPs. Those are the same
> as shown by dig in a shell.
> (10.205.100.120 and 10.205.100.61).
>
> However, haproxy apparently still uses and old address 10.205.100.53 that
> the ELB probably had at
> some point -- hard to tell after the fact. In the pcap I can "ICMP Host
> Unreachable" responses for
> attempts to connect to 10.205.100.53 on all the ports my backends specify.
>
> At first I suspected length issues, but the responses are just 174 bytes
> long.
> If needed, I can provide the pcap privately.
>
> This can be a real fun-killer when all the sidecars suddenly lose
> connection across tens of VMs...
>
> Am I missing something in my config, or is this an actual (maybe known?)
> bug?
>
> Configuration, dig output, and version info below.
>
> Kind regards,
>
> Daniel
>
>
>
>
> dig output (actual name is a little longer,
> I cut off the name for brevity and privacy).
> ---------
> [aws:staging-staging] root:~# dig @169.254.169.253
> loadbalancer-internal.private
>
> ; <<>> DiG 9.9.5-3ubuntu0.17-Ubuntu <<>> @169.254.169.253
> loadbalancer-internal.private
> ; (1 server found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 501
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
>
> ;; OPT PSEUDOSECTION:
> ; EDNS: version: 0, flags:; udp: 4096
> ;; QUESTION SECTION:
> ;loadbalancer-internal.private. IN A
>
> ;; ANSWER SECTION:
> loadbalancer-internal.private. 11 IN A 10.205.100.120
> loadbalancer-internal.private. 11 IN A 10.205.100.61
>
> ;; Query time: 0 msec
> ;; SERVER: 169.254.169.253#53(169.254.169.253)
> ;; WHEN: Mon Aug 27 15:51:48 CEST 2018
> ;; MSG SIZE  rcvd: 141
>
>
>
> [aws:staging-staging] root:~# dig @127.0.0.1 loadbalancer-internal.private
>
> ; <<>> DiG 9.9.5-3ubuntu0.17-Ubuntu <<>> @127.0.0.1
> loadbalancer-internal.private
> ; (1 server found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20706
> ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
>
> ;; QUESTION SECTION:
> ;loadbalancer-internal.private. IN A
>
> ;; ANSWER SECTION:
> loadbalancer-internal.private. 5 IN A 10.205.100.61
> loadbalancer-internal.private. 5 IN A 10.205.100.120
>
> ;; Query time: 0 msec
> ;; SERVER: 127.0.0.1#53(127.0.0.1)
> ;; WHEN: Mon Aug 27 15:51:54 CEST 2018
> ;; MSG SIZE  rcvd: 130
> ---------
>
>
>
> haproxy.cfg (there are more proxies in the real thing, but they are all
> the the same,
> just for different ports):
> ---------------------
> global
>   log /dev/log len 350 local1 info
>   log-tag haproxy
>   stats socket /var/run/haproxy.stat user haproxy group haproxy mode 600
> level admin
>   chroot /var/lib/haproxy
>   user haproxy
>   group haproxy
>   hard-stop-after 30s
>
>
> defaults
>   mode tcp
>   log global
>   option tcplog
>   option dontlognull
>   option http-keep-alive
>   timeout http-request 10s
>   timeout queue 1m
>   timeout connect 5s
>   timeout client 2m
>   timeout server 2m
>   timeout http-keep-alive 10s
>   timeout check 5s
>   retries 3
>   maxconn 2000
>
> resolvers default
>   nameserver local 127.0.0.1:53
>   nameserver aws 169.254.169.253:53
>
>
Maybe try tuning the "hold valid" parameter, see
https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#5.3.2
The default value is 30s so setting it to 1s would make more sense when the
backend IP's often change.



>
> listen rabbitmq
>   bind 127.0.0.1:5671
>   option dontlog-normal
>   server lb-internal loadbalancer-internal.private:5671 resolvers default
> check addr loadbalancer-internal.private port 5671
> ---------------------
>
>
>
> Log:
> --------------------
> ...
> Aug 27 16:49:09 xxx haproxy[2090]: 127.0.0.1:35891
> [27/Aug/2018:16:49:09.031] rabbitmq rabbitmq/<NOSRV> -1/-1/0 0 SC 0/0/0/0/0
> 0/0
> ...
> --------------------
>
>
>
> Version info:
> ---------------------
> [aws:staging-staging] root:~# haproxy -vvv
> HA-Proxy version 1.7.11-1ppa1~trusty 2018/04/30
> Copyright 2000-2018 Willy Tarreau <wi...@haproxy.org>
>
> Build options :
>   TARGET  = linux2628
>   CPU     = generic
>   CC      = gcc
>   CFLAGS  = -g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4
> -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2
>   OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1
> USE_LUA=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_NS=1
>
> Default settings :
>   maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
>
> Encrypted password support via crypt(3): yes
> Built with zlib version : 1.2.8
> Running on zlib version : 1.2.8
> Compression algorithms supported : identity("identity"),
> deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
> Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
> Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
> OpenSSL library supports TLS extensions : yes
> OpenSSL library supports SNI : yes
> OpenSSL library supports prefer-server-ciphers : yes
> Built with PCRE version : 8.31 2012-07-06
> Running on PCRE version : 8.31 2012-07-06
> PCRE library supports JIT : no (libpcre build without JIT?)
> Built with Lua version : Lua 5.3.1
> Built with transparent proxy support using: IP_TRANSPARENT
> IPV6_TRANSPARENT IP_FREEBIND
> Built with network namespace support
>
> Available polling systems :
>       epoll : pref=300,  test result OK
>        poll : pref=200,  test result OK
>      select : pref=150,  test result OK
> Total: 3 (3 usable), will use epoll.
>
> Available filters :
> [COMP] compression
> [TRACE] trace
> [SPOE] spoe
> ---------------------
>
>
> --
> Daniel Schneller
> Principal Cloud Engineer
>
> CenterDevice GmbH
> Rheinwerkallee 3
> 53227 Bonn
> www.centerdevice.com
>
> __________________________________________
> Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina, Michael
> Rosbach, Handelsregister-Nr.: HRB 18655, HR-Gericht: Bonn,
> USt-IdNr.: DE-815299431
>
> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche
> und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige
> Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren
> Sie bitte sofort den Absender und löschen Sie diese E-Mail und evtl.
> beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe
> dieser E-Mail ist nicht gestattet.
>
>
>

-- 
Igor Cicimov | DevOps


p. +61 (0) 433 078 728
e. ig...@encompasscorporation.com <http://encompasscorporation.com/>
w*.* www.encompasscorporation.com
a. Level 4, 65 York Street, Sydney 2000

Reply via email to