Re: DNS Resolver Issues

PiBa-NL Sat, 23 Mar 2019 06:54:34 -0700

Hi Daniel, Baptiste,

@Daniel, can you remove the 'addr loadbalancer-internal.xxx.yyy' fromthe server check? It seems to me that that name is not being resolved bythe 'resolvers'. And even if it would it would be kinda redundant as itis in the example as it is the same as the servername.?. Not sure howfar below scenarios are all explained by this though..

@Baptiste, is it intentional that a wrong 'addr' dns name makes haproxyfail to start despite having the supposedly never failing'default-server init-addr last,libc,none' ? Is it possibly a goodfeature request to support re-resolving a dns name for the addr settingas well ?


Regards,
PiBa-NL (Pieter)

Op 21-3-2019 om 20:37 schreef Daniel Schneller:

Hi!

Thanks for the response. I had looked at the "hold" directives, but since they 
all seem to have reasonable defaults, I did not touch them.
I specified 10s explictly, but it did not make a difference.

I did some more tests, however, and it seems to have more to do with the number 
of responses for the initial(?) DNS queries.
Hopefully these three tables make sense and don't get mangled in the mail. The 
"templated"
proxy is defined via "server-template" with 3 "slots". The "regular" one just as 
"server".


Test 1: Start out  with both "valid" and "broken" DNS entries. Then comment 
out/add back
one at a time as described in (1)-(5).
Each time after changing /etc/hosts, restart dnsmasq and check haproxy via 
hatop.
Haproxy started fresh once dnsmasq was set up to (1).

                        |  state           state
             /etc/hosts |  regular         templated
            ------------|-----------------------------
(1)         BRK        |  UP/L7OK         DOWN/L4TOUT
             VALID      |                  MAINT/resolution
                        |                  UP/L7OK
            ------------|--------------------------------

(2)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
             #VALID     |                  MAINT/resolution
                        |                  MAINT/resolution
                        |
(3)         #BRK       |  UP/L7OK         UP/L7OK
             VALID      |                  MAINT/resolution
                        |                  MAINT/resolution
            ------------|------------------------------------
(4)         BRK        |  UP/L7OK         UP/L7OK
             VALID      |                  DOWN/L4TOUT
                        |                  MAINT/resolution
            ------------|------------------------------------
(5)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
             #VALID     |                  MAINT/resolution
                        |                  MAINT/resolution

This all looks normal and as expected. As soon as the "VALID" DNS entry is present, the

UP state follows within a few seconds.


Test 2: Start out "valid only" (1) and proceed as described in (2)-(5), again 
restarting
dnsmasq each time, and haproxy reloaded after dnsmasq was set up to (1).

                        |  state           state
             /etc/hosts |  regular         templated
            ------------|------------------------------------
(1)         #BRK       |  UP/L7OK         MAINT/resolution
             VALID      |                  MAINT/resolution
                        |                  UP/L7OK
            ------------|------------------------------------
(2)         BRK        |  UP/L7OK         DOWN/L4TOUT
             VALID      |                  MAINT/resolution
                        |                  UP/L7OK
            ------------|------------------------------------
(3)         #BRK       |  UP/L7OK         MAINT/resolution
             VALID      |                  MAINT/resolution
                        |                  UP/L7OK
            ------------|------------------------------------
(4)         BRK        |  UP/L7OK         DOWN/L4TOUT
             VALID      |                  MAINT/resolution
                        |                  UP/L7OK
            ------------|------------------------------------
(5)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
             #VALID     |                  MAINT/resolution
                        |                  MAINT/resolution

Everything good here, too. Adding the broken DNS entry does not bring the proxies down

until only the broken one is left.



Test 3: Start out "broken only" (1).
Again, same as before, haproxy restarted once dnsmasq was initialized to (1).

                        |  state           state
             /etc/hosts |  regular         templated
            ------------|------------------------------------
(1)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
             #VALID     |                  MAINT/resolution
                        |                  MAINT/resolution
            ------------|------------------------------------
(2)         BRK        |  DOWN/L4TOUT     UP/L7OK
             VALID      |                  MAINT/resolution
                        |                  MAINT/resolution
            ------------|------------------------------------
(3)         #BRK       |  UP/L7OK         MAINT/resolution
             VALID      |                  UP/L7OK
                        |                  MAINT/resolution
            ------------|------------------------------------
(4)         BRK        |  UP/L7OK         DOWN/L4TOUT
             VALID      |                  UP/L7OK
                        |                  MAINT/resolution
            ------------|------------------------------------
(5)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
             #VALID     |                  MAINT/resolution
                        |                  MAINT/resolution


Here it becomes interesting. In (1) both regular and templated proxies are 
DOWN, of course.
However, adding in a second DNS response in (2) brings the templated proxy UP, 
but the regular
one stays DOWN. Only when in (3) the valid response is the only one presented, 
does it go
UP as well. Adding the broken one back (4) is of no consequence then. And 
again, after
leaving just the broken response (5), both correctly go DOWN.

So it would appear that if haproxy starts with just a single "broken" DNS 
response, adding
a healthy one later one is not recognized. Instead, it stays DOWN. "Replacing" 
the single
broken response with a single "valid" response, however, brings it to life, and 
it won't be
discouraged by bringing the broken one back in.

Tests 1 and 2 make sense to me, but test 3 I don't understand. For now, I have 
worked
around the issue by defining all my relevant backends with server-template and 
at least
2 slots, but I would still like to understand it. And maybe it is a bug, after 
all ;)

Kind regards, and thanks for a great piece of software!

Daniel

On 21. Mar 2019, at 14:28, Bruno Henc <[email protected]> wrote:

Hello Daniel,


You might be missing the hold-valid directive in your resolvers section: 
https://www.haproxy.com/documentation/hapee/1-9r1/onepage/#5.3.2-timeout

This should force HAProxy to fetch the DNS record values from the resolver.

A reload of the HAProxy instance also forces the instances to query all records 
from the resolver.

Can you please retest with the updated configuration and report back the 
results?


Best regards,

Bruno Henc

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, March 21, 2019 12:09 PM, Daniel Schneller 
<[email protected]> wrote:

Hello!

Friendly bump :)
I'd be willing to amend the documentation once I understand what's going on :D

Cheers,
Daniel

On 18. Mar 2019, at 20:28, Daniel Schneller [email protected] 
wrote:
Hi everyone!
I assume I am misunderstanding something, but I cannot figure out what it is.
We are using haproxy in AWS, in this case as sidecars to applications so they 
need not
know about changing backend addresses at all, but can always talk to localhost.
Haproxy listens on localhost and then forwards traffic to an ELB instance.
This works great, but there have been two occasions now, where due to a change 
in the
ELB's IP addresses, our services went down, because the backends could not be 
reached
anymore. I don't understand why haproxy sticks to the old IP address instead of 
going
to one of the updated ones.
There is a resolvers section which points to the local dnsmasq instance (there 
to send
some requests to consul, but that's not used here). All other traffic is 
forwarded on
to the AWS DNS server set via DHCP.
I managed to get timely updates and updated backend servers when using 
server-template,
but form what I understand this should not really be necessary for this.
This is the trimmed down sidecar config. I have not made any changes to dns 
timeouts etc.
resolvers default

dnsmasq

========

nameserver local 127.0.0.1:53
listen regular
bind 127.0.0.1:9300
option dontlog-normal
server lb-internal loadbalancer-internal.xxx.yyy:9300 resolvers default check 
addr loadbalancer-internal.xxx.yyy port 9300
listen templated
bind 127.0.0.1:9200
option dontlog-normal
option httpchk /haproxy-simple-healthcheck
server-template lb-internal 2 loadbalancer-internal.xxx.yyy:9200 resolvers 
default check port 9299
To simulate changing ELB adresses, I added entries for 
loadbalancer-internal.xxx.yyy in /etc/hosts
and to be able to control them via dnsmasq.
I tried different scenarios, but could not reliably predict what would happen 
in all cases.
The address ending in 52 (marked as "valid" below) is a currently (as of the 
time of testing)
valid IP for the ELB. The one ending in 199 (marked "invalid") is an unused 
private IP address
in my VPC.
Starting with /etc/hosts:
10.205.100.52 loadbalancer-internal.xxx.yyy # valid
10.205.100.199 loadbalancer-internal.xxx.yyy # invalid
haproxy starts and reports:
regular: lb-internal UP/L7OK
templated: lb-internal1 DOWN/L4TOUT
lb-internal2 UP/L7OK
That's expected. Now when I edit /etc/hosts to only contain the invalid address
and restart dnsmasq, I would expect both proxies to go fully down. But only the 
templated
proxy behaves like that:
regular: lb-internal UP/L7OK
templated: lb-internal1 DOWN/L4TOUT
lb-internal2 MAINT (resolution)
Reloading haproxy in this state leads to:
regular: lb-internal DOWN/L4TOUT
templated: lb-internal1 MAINT (resolution)
lb-internal2 DOWN/L4TOUT
After fixing /etc/hosts to include the valid server again and restarting 
dnsmasq:
regular: lb-internal DOWN/L4TOUT
templated: lb-internal1 UP/L7OK
lb-internal2 DOWN/L4TOUT
Shouldn't the regular proxy also recognize the change and bring the backend up 
or down
depending on the DNS change? I have waited for several health check rounds 
(seeing
"* L4TOUT" and "L4TOUT") toggle, but it still never updates.
I also tried to have only the invalid address in /etc/hosts, then restarting 
haproxy.
The regular backends will never recognize it when I add the valid one back in.
The templated one does, unless I set it up to have only 1 instead of 2 server 
slots.
In that case it behaves will also only pick up the valid server when reloaded.
On the other hand, it will recognize when I remove the valid server without a 
reload
on the next health check, but not bring them back in and make the proxy UP when 
it
comes back.
I assume my understanding of something here is broken, and I would gladly be 
told
about it :)
Thanks a lot!
Daniel

Version Info:

--------------

$ haproxy -vv
HA-Proxy version 1.8.19-1ppa1~trusty 2019/02/12
Copyright 2000-2019 Willy Tarreau [email protected]
Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -O2 -g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4 -Wformat 
-Werror=format-security -D_FORTIFY_SOURCE=2 -fno-strict-aliasing 
-Wdeclaration-after-statement -fwrapv -Wno-unused-label
OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 
USE_PCRE=1 USE_PCRE_JIT=1 USE_NS=1
Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.1
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT 
IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.31 2012-07-06
Running on PCRE version : 8.31 2012-07-06
PCRE library supports JIT : no (libpcre build without JIT?)
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), 
raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace
--
Daniel Schneller
Principal Cloud Engineer
CenterDevice GmbH
Rheinwerkallee 3
53227 Bonn
www.centerdevice.com

Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina, Michael Rosbach, 
Handelsregister-Nr.: HRB 18655, HR-Gericht: Bonn, USt-IdNr.: DE-815299431
Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche 
und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige 
Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie 
bitte sofort den Absender und löschen Sie diese E-Mail und evtl. beigefügter 
Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen evtl. beigefügter 
Dateien sowie die unbefugte Weitergabe dieser E-Mail ist nicht gestattet.

Re: DNS Resolver Issues

Reply via email to