Hi!

There seems to be some kind of problem when backend servers (in this case ELBs) 
change their IP
addresses.

At some point, apparently, the ELB behind the DNS name in my config changed it 
address(es).
Lots of haproxys we use as sidecars on our application servers failed their 
health checks afterwards with
L4 timeouts. For testing, I reloaded haproxy on one of them, and the error went 
away.

The resolvers section has two servers: a local dnsmasq and the AWS VPC DNS 
server at the "magic"
address 169.254.19.253.

On a different instance I captured some traffic. The pcap shows the DNS queries 
and responses
for the backend server name going to both 127.0.0.1:53 and 169.254.169.253:53. 
Both servers
reply with the same answers, carrying the current IPs. Those are the same as 
shown by dig in a shell.
(10.205.100.120 and 10.205.100.61).

However, haproxy apparently still uses and old address 10.205.100.53 that the 
ELB probably had at
some point -- hard to tell after the fact. In the pcap I can "ICMP Host 
Unreachable" responses for
attempts to connect to 10.205.100.53 on all the ports my backends specify.

At first I suspected length issues, but the responses are just 174 bytes long.
If needed, I can provide the pcap privately.

This can be a real fun-killer when all the sidecars suddenly lose connection 
across tens of VMs...

Am I missing something in my config, or is this an actual (maybe known?) bug?

Configuration, dig output, and version info below.

Kind regards,

Daniel




dig output (actual name is a little longer,
I cut off the name for brevity and privacy).
---------
[aws:staging-staging] root:~# dig @169.254.169.253 loadbalancer-internal.private

; <<>> DiG 9.9.5-3ubuntu0.17-Ubuntu <<>> @169.254.169.253 
loadbalancer-internal.private
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 501
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;loadbalancer-internal.private. IN A

;; ANSWER SECTION:
loadbalancer-internal.private. 11 IN    A 10.205.100.120
loadbalancer-internal.private. 11 IN    A 10.205.100.61

;; Query time: 0 msec
;; SERVER: 169.254.169.253#53(169.254.169.253)
;; WHEN: Mon Aug 27 15:51:48 CEST 2018
;; MSG SIZE  rcvd: 141



[aws:staging-staging] root:~# dig @127.0.0.1 loadbalancer-internal.private

; <<>> DiG 9.9.5-3ubuntu0.17-Ubuntu <<>> @127.0.0.1 
loadbalancer-internal.private
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20706
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;loadbalancer-internal.private. IN A

;; ANSWER SECTION:
loadbalancer-internal.private. 5 IN A 10.205.100.61
loadbalancer-internal.private. 5 IN A 10.205.100.120

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Aug 27 15:51:54 CEST 2018
;; MSG SIZE  rcvd: 130
---------



haproxy.cfg (there are more proxies in the real thing, but they are all the the 
same,
just for different ports):
---------------------
global
  log /dev/log len 350 local1 info
  log-tag haproxy
  stats socket /var/run/haproxy.stat user haproxy group haproxy mode 600 level 
admin
  chroot /var/lib/haproxy
  user haproxy
  group haproxy
  hard-stop-after 30s


defaults
  mode tcp
  log global
  option tcplog
  option dontlognull
  option http-keep-alive
  timeout http-request 10s
  timeout queue 1m
  timeout connect 5s
  timeout client 2m
  timeout server 2m
  timeout http-keep-alive 10s
  timeout check 5s
  retries 3
  maxconn 2000

resolvers default
  nameserver local 127.0.0.1:53
  nameserver aws 169.254.169.253:53


listen rabbitmq
  bind 127.0.0.1:5671
  option dontlog-normal
  server lb-internal loadbalancer-internal.private:5671 resolvers default check 
addr loadbalancer-internal.private port 5671
---------------------



Log:
--------------------
...
Aug 27 16:49:09 xxx haproxy[2090]: 127.0.0.1:35891 [27/Aug/2018:16:49:09.031] 
rabbitmq rabbitmq/<NOSRV> -1/-1/0 0 SC 0/0/0/0/0 0/0
...
--------------------



Version info:
---------------------
[aws:staging-staging] root:~# haproxy -vvv
HA-Proxy version 1.7.11-1ppa1~trusty 2018/04/30
Copyright 2000-2018 Willy Tarreau <wi...@haproxy.org>

Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4 -Wformat 
-Werror=format-security -D_FORTIFY_SOURCE=2
  OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 
USE_PCRE=1 USE_PCRE_JIT=1 USE_NS=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), 
raw-deflate("deflate"), gzip("gzip")
Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.31 2012-07-06
Running on PCRE version : 8.31 2012-07-06
PCRE library supports JIT : no (libpcre build without JIT?)
Built with Lua version : Lua 5.3.1
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT 
IP_FREEBIND
Built with network namespace support

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
        [COMP] compression
        [TRACE] trace
        [SPOE] spoe
---------------------


--
Daniel Schneller
Principal Cloud Engineer

CenterDevice GmbH
Rheinwerkallee 3
53227 Bonn
www.centerdevice.com

__________________________________________
Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina, Michael Rosbach, 
Handelsregister-Nr.: HRB 18655, HR-Gericht: Bonn, USt-IdNr.: DE-815299431

Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche 
und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige 
Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie 
bitte sofort den Absender und löschen Sie diese E-Mail und evtl. beigefügter 
Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen evtl. beigefügter 
Dateien sowie die unbefugte Weitergabe dieser E-Mail ist nicht gestattet.


Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to