Hi everyone!
I assume I am misunderstanding something, but I cannot figure out what it is.
We are using haproxy in AWS, in this case as sidecars to applications so they
need not
know about changing backend addresses at all, but can always talk to localhost.
Haproxy listens on localhost and then forwards traffic to an ELB instance.
This works great, but there have been two occasions now, where due to a change
in the
ELB's IP addresses, our services went down, because the backends could not be
reached
anymore. I don't understand why haproxy sticks to the old IP address instead of
going
to one of the updated ones.
There is a resolvers section which points to the local dnsmasq instance (there
to send
some requests to consul, but that's not used here). All other traffic is
forwarded on
to the AWS DNS server set via DHCP.
I managed to get timely updates and updated backend servers when using
server-template,
but form what I understand this should not really be necessary for this.
This is the trimmed down sidecar config. I have not made any changes to dns
timeouts etc.
resolvers default
# dnsmasq
nameserver local 127.0.0.1:53
listen regular
bind 127.0.0.1:9300
option dontlog-normal
server lb-internal loadbalancer-internal.xxx.yyy:9300 resolvers default check
addr loadbalancer-internal.xxx.yyy port 9300
listen templated
bind 127.0.0.1:9200
option dontlog-normal
option httpchk /haproxy-simple-healthcheck
server-template lb-internal 2 loadbalancer-internal.xxx.yyy:9200 resolvers
default check port 9299
To simulate changing ELB adresses, I added entries for
loadbalancer-internal.xxx.yyy in /etc/hosts
and to be able to control them via dnsmasq.
I tried different scenarios, but could not reliably predict what would happen
in all cases.
The address ending in 52 (marked as "valid" below) is a currently (as of the
time of testing)
valid IP for the ELB. The one ending in 199 (marked "invalid") is an unused
private IP address
in my VPC.
Starting with /etc/hosts:
10.205.100.52 loadbalancer-internal.xxx.yyy # valid
10.205.100.199 loadbalancer-internal.xxx.yyy # invalid
haproxy starts and reports:
regular: lb-internal UP/L7OK
templated: lb-internal1 DOWN/L4TOUT
lb-internal2 UP/L7OK
That's expected. Now when I edit /etc/hosts to _only_ contain the _invalid_
address
and restart dnsmasq, I would expect both proxies to go fully down. But only the
templated
proxy behaves like that:
regular: lb-internal UP/L7OK
templated: lb-internal1 DOWN/L4TOUT
lb-internal2 MAINT (resolution)
Reloading haproxy in this state leads to:
regular: lb-internal DOWN/L4TOUT
templated: lb-internal1 MAINT (resolution)
lb-internal2 DOWN/L4TOUT
After fixing /etc/hosts to include the valid server again and restarting
dnsmasq:
regular: lb-internal DOWN/L4TOUT
templated: lb-internal1 UP/L7OK
lb-internal2 DOWN/L4TOUT
Shouldn't the regular proxy also recognize the change and bring the backend up
or down
depending on the DNS change? I have waited for several health check rounds
(seeing
"* L4TOUT" and "L4TOUT") toggle, but it still never updates.
I also tried to have _only_ the invalid address in /etc/hosts, then restarting
haproxy.
The regular backends will never recognize it when I add the valid one back in.
The templated one does, _unless_ I set it up to have only 1 instead of 2 server
slots.
In that case it behaves will also only pick up the valid server when reloaded.
On the other hand, it _will_ recognize when I remove the valid server without a
reload
on the next health check, but _not_ bring them back in and make the proxy UP
when it
comes back.
I assume my understanding of something here is broken, and I would gladly be
told
about it :)
Thanks a lot!
Daniel
Version Info:
------------------
$ haproxy -vv
HA-Proxy version 1.8.19-1ppa1~trusty 2019/02/12
Copyright 2000-2019 Willy Tarreau <[email protected]>
Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -O2 -g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4
-Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fno-strict-aliasing
-Wdeclaration-after-statement -fwrapv -Wno-unused-label
OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1
USE_PCRE=1 USE_PCRE_JIT=1 USE_NS=1
Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.1
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT
IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.31 2012-07-06
Running on PCRE version : 8.31 2012-07-06
PCRE library supports JIT : no (libpcre build without JIT?)
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"),
raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace
--
Daniel Schneller
Principal Cloud Engineer
CenterDevice GmbH
Rheinwerkallee 3
53227 Bonn
www.centerdevice.com
__________________________________________
Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina, Michael Rosbach,
Handelsregister-Nr.: HRB 18655, HR-Gericht: Bonn, USt-IdNr.: DE-815299431
Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche
und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige
Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie
bitte sofort den Absender und löschen Sie diese E-Mail und evtl. beigefügter
Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen evtl. beigefügter
Dateien sowie die unbefugte Weitergabe dieser E-Mail ist nicht gestattet.