Re: DNS Resolver Issues

Baptiste Mon, 25 Mar 2019 14:55:22 -0700

Hi all,

Thanks @daniel for you very detailed report and @Piba for your help.
As Piba pointed out, the issue is related to the 'addr' parameter.
Currently, the only component in HAProxy which can benefit from dynamic
resolution at run time is the 'server', which means any other object using
a DNS hostname which does not resolve at start up may trigger an error,
like you discovered with 'addr'.


@Piba, feel free to fill up a feature request on github and Cc me there, so
we can discuss this point.

Baptiste



On Sat, Mar 23, 2019 at 2:53 PM PiBa-NL <[email protected]> wrote:

> Hi Daniel, Baptiste,
>
> @Daniel, can you remove the 'addr loadbalancer-internal.xxx.yyy' from
> the server check? It seems to me that that name is not being resolved by
> the 'resolvers'. And even if it would it would be kinda redundant as it
> is in the example as it is the same as the servername.?. Not sure how
> far below scenarios are all explained by this though..
>
> @Baptiste, is it intentional that a wrong 'addr' dns name makes haproxy
> fail to start despite having the supposedly never failing
> 'default-server init-addr last,libc,none' ? Is it possibly a good
> feature request to support re-resolving a dns name for the addr setting
> as well ?
>
> Regards,
> PiBa-NL (Pieter)
>
> Op 21-3-2019 om 20:37 schreef Daniel Schneller:
> > Hi!
> >
> > Thanks for the response. I had looked at the "hold" directives, but
> since they all seem to have reasonable defaults, I did not touch them.
> > I specified 10s explictly, but it did not make a difference.
> >
> > I did some more tests, however, and it seems to have more to do with the
> number of responses for the initial(?) DNS queries.
> > Hopefully these three tables make sense and don't get mangled in the
> mail. The "templated"
> > proxy is defined via "server-template" with 3 "slots". The "regular" one
> just as "server".
> >
> >
> > Test 1: Start out  with both "valid" and "broken" DNS entries. Then
> comment out/add back
> > one at a time as described in (1)-(5).
> > Each time after changing /etc/hosts, restart dnsmasq and check haproxy
> via hatop.
> > Haproxy started fresh once dnsmasq was set up to (1).
> >
> >                         |  state           state
> >              /etc/hosts |  regular         templated
> >             ------------|-----------------------------
> > (1)         BRK        |  UP/L7OK         DOWN/L4TOUT
> >              VALID      |                  MAINT/resolution
> >                         |                  UP/L7OK
> >             ------------|--------------------------------
> >
> > (2)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
> >              #VALID     |                  MAINT/resolution
> >                         |                  MAINT/resolution
> >                         |
> > (3)         #BRK       |  UP/L7OK         UP/L7OK
> >              VALID      |                  MAINT/resolution
> >                         |                  MAINT/resolution
> >             ------------|------------------------------------
> > (4)         BRK        |  UP/L7OK         UP/L7OK
> >              VALID      |                  DOWN/L4TOUT
> >                         |                  MAINT/resolution
> >             ------------|------------------------------------
> > (5)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
> >              #VALID     |                  MAINT/resolution
> >                         |                  MAINT/resolution
> >
> > This all looks normal and as expected. As soon as the "VALID" DNS entry
> is present, the
> > UP state follows within a few seconds.
> >
> >
> >
> > Test 2: Start out "valid only" (1) and proceed as described in (2)-(5),
> again restarting
> > dnsmasq each time, and haproxy reloaded after dnsmasq was set up to (1).
> >
> >                         |  state           state
> >              /etc/hosts |  regular         templated
> >             ------------|------------------------------------
> > (1)         #BRK       |  UP/L7OK         MAINT/resolution
> >              VALID      |                  MAINT/resolution
> >                         |                  UP/L7OK
> >             ------------|------------------------------------
> > (2)         BRK        |  UP/L7OK         DOWN/L4TOUT
> >              VALID      |                  MAINT/resolution
> >                         |                  UP/L7OK
> >             ------------|------------------------------------
> > (3)         #BRK       |  UP/L7OK         MAINT/resolution
> >              VALID      |                  MAINT/resolution
> >                         |                  UP/L7OK
> >             ------------|------------------------------------
> > (4)         BRK        |  UP/L7OK         DOWN/L4TOUT
> >              VALID      |                  MAINT/resolution
> >                         |                  UP/L7OK
> >             ------------|------------------------------------
> > (5)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
> >              #VALID     |                  MAINT/resolution
> >                         |                  MAINT/resolution
> >
> > Everything good here, too. Adding the broken DNS entry does not bring
> the proxies down
> > until only the broken one is left.
> >
> >
> >
> > Test 3: Start out "broken only" (1).
> > Again, same as before, haproxy restarted once dnsmasq was initialized to
> (1).
> >
> >                         |  state           state
> >              /etc/hosts |  regular         templated
> >             ------------|------------------------------------
> > (1)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
> >              #VALID     |                  MAINT/resolution
> >                         |                  MAINT/resolution
> >             ------------|------------------------------------
> > (2)         BRK        |  DOWN/L4TOUT     UP/L7OK
> >              VALID      |                  MAINT/resolution
> >                         |                  MAINT/resolution
> >             ------------|------------------------------------
> > (3)         #BRK       |  UP/L7OK         MAINT/resolution
> >              VALID      |                  UP/L7OK
> >                         |                  MAINT/resolution
> >             ------------|------------------------------------
> > (4)         BRK        |  UP/L7OK         DOWN/L4TOUT
> >              VALID      |                  UP/L7OK
> >                         |                  MAINT/resolution
> >             ------------|------------------------------------
> > (5)         BRK        |  DOWN/L4TOUT     DOWN/L4TOUT
> >              #VALID     |                  MAINT/resolution
> >                         |                  MAINT/resolution
> >
> >
> > Here it becomes interesting. In (1) both regular and templated proxies
> are DOWN, of course.
> > However, adding in a second DNS response in (2) brings the templated
> proxy UP, but the regular
> > one stays DOWN. Only when in (3) the valid response is the only one
> presented, does it go
> > UP as well. Adding the broken one back (4) is of no consequence then.
> And again, after
> > leaving just the broken response (5), both correctly go DOWN.
> >
> > So it would appear that if haproxy starts with just a single "broken"
> DNS response, adding
> > a healthy one later one is not recognized. Instead, it stays DOWN.
> "Replacing" the single
> > broken response with a single "valid" response, however, brings it to
> life, and it won't be
> > discouraged by bringing the broken one back in.
> >
> > Tests 1 and 2 make sense to me, but test 3 I don't understand. For now,
> I have worked
> > around the issue by defining all my relevant backends with
> server-template and at least
> > 2 slots, but I would still like to understand it. And maybe it is a bug,
> after all ;)
> >
> > Kind regards, and thanks for a great piece of software!
> >
> > Daniel
> >
> >
> >
> >
> >
> >> On 21. Mar 2019, at 14:28, Bruno Henc <[email protected]> wrote:
> >>
> >> Hello Daniel,
> >>
> >>
> >> You might be missing the hold-valid directive in your resolvers
> section:
> https://www.haproxy.com/documentation/hapee/1-9r1/onepage/#5.3.2-timeout
> >>
> >> This should force HAProxy to fetch the DNS record values from the
> resolver.
> >>
> >> A reload of the HAProxy instance also forces the instances to query all
> records from the resolver.
> >>
> >> Can you please retest with the updated configuration and report back
> the results?
> >>
> >>
> >> Best regards,
> >>
> >> Bruno Henc
> >>
> >> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> >> On Thursday, March 21, 2019 12:09 PM, Daniel Schneller <
> [email protected]> wrote:
> >>
> >>> Hello!
> >>>
> >>> Friendly bump :)
> >>> I'd be willing to amend the documentation once I understand what's
> going on :D
> >>>
> >>> Cheers,
> >>> Daniel
> >>>
> >>>> On 18. Mar 2019, at 20:28, Daniel Schneller
> [email protected] wrote:
> >>>> Hi everyone!
> >>>> I assume I am misunderstanding something, but I cannot figure out
> what it is.
> >>>> We are using haproxy in AWS, in this case as sidecars to applications
> so they need not
> >>>> know about changing backend addresses at all, but can always talk to
> localhost.
> >>>> Haproxy listens on localhost and then forwards traffic to an ELB
> instance.
> >>>> This works great, but there have been two occasions now, where due to
> a change in the
> >>>> ELB's IP addresses, our services went down, because the backends
> could not be reached
> >>>> anymore. I don't understand why haproxy sticks to the old IP address
> instead of going
> >>>> to one of the updated ones.
> >>>> There is a resolvers section which points to the local dnsmasq
> instance (there to send
> >>>> some requests to consul, but that's not used here). All other traffic
> is forwarded on
> >>>> to the AWS DNS server set via DHCP.
> >>>> I managed to get timely updates and updated backend servers when
> using server-template,
> >>>> but form what I understand this should not really be necessary for
> this.
> >>>> This is the trimmed down sidecar config. I have not made any changes
> to dns timeouts etc.
> >>>> resolvers default
> >>>>
> >>>> dnsmasq
> >>>>
> >>>> ========
> >>>>
> >>>> nameserver local 127.0.0.1:53
> >>>> listen regular
> >>>> bind 127.0.0.1:9300
> >>>> option dontlog-normal
> >>>> server lb-internal loadbalancer-internal.xxx.yyy:9300 resolvers
> default check addr loadbalancer-internal.xxx.yyy port 9300
> >>>> listen templated
> >>>> bind 127.0.0.1:9200
> >>>> option dontlog-normal
> >>>> option httpchk /haproxy-simple-healthcheck
> >>>> server-template lb-internal 2 loadbalancer-internal.xxx.yyy:9200
> resolvers default check port 9299
> >>>> To simulate changing ELB adresses, I added entries for
> loadbalancer-internal.xxx.yyy in /etc/hosts
> >>>> and to be able to control them via dnsmasq.
> >>>> I tried different scenarios, but could not reliably predict what
> would happen in all cases.
> >>>> The address ending in 52 (marked as "valid" below) is a currently (as
> of the time of testing)
> >>>> valid IP for the ELB. The one ending in 199 (marked "invalid") is an
> unused private IP address
> >>>> in my VPC.
> >>>> Starting with /etc/hosts:
> >>>> 10.205.100.52 loadbalancer-internal.xxx.yyy # valid
> >>>> 10.205.100.199 loadbalancer-internal.xxx.yyy # invalid
> >>>> haproxy starts and reports:
> >>>> regular: lb-internal UP/L7OK
> >>>> templated: lb-internal1 DOWN/L4TOUT
> >>>> lb-internal2 UP/L7OK
> >>>> That's expected. Now when I edit /etc/hosts to only contain the
> invalid address
> >>>> and restart dnsmasq, I would expect both proxies to go fully down.
> But only the templated
> >>>> proxy behaves like that:
> >>>> regular: lb-internal UP/L7OK
> >>>> templated: lb-internal1 DOWN/L4TOUT
> >>>> lb-internal2 MAINT (resolution)
> >>>> Reloading haproxy in this state leads to:
> >>>> regular: lb-internal DOWN/L4TOUT
> >>>> templated: lb-internal1 MAINT (resolution)
> >>>> lb-internal2 DOWN/L4TOUT
> >>>> After fixing /etc/hosts to include the valid server again and
> restarting dnsmasq:
> >>>> regular: lb-internal DOWN/L4TOUT
> >>>> templated: lb-internal1 UP/L7OK
> >>>> lb-internal2 DOWN/L4TOUT
> >>>> Shouldn't the regular proxy also recognize the change and bring the
> backend up or down
> >>>> depending on the DNS change? I have waited for several health check
> rounds (seeing
> >>>> "* L4TOUT" and "L4TOUT") toggle, but it still never updates.
> >>>> I also tried to have only the invalid address in /etc/hosts, then
> restarting haproxy.
> >>>> The regular backends will never recognize it when I add the valid one
> back in.
> >>>> The templated one does, unless I set it up to have only 1 instead of
> 2 server slots.
> >>>> In that case it behaves will also only pick up the valid server when
> reloaded.
> >>>> On the other hand, it will recognize when I remove the valid server
> without a reload
> >>>> on the next health check, but not bring them back in and make the
> proxy UP when it
> >>>> comes back.
> >>>> I assume my understanding of something here is broken, and I would
> gladly be told
> >>>> about it :)
> >>>> Thanks a lot!
> >>>> Daniel
> >>>>
> >>>> Version Info:
> >>>>
> >>>> --------------
> >>>>
> >>>> $ haproxy -vv
> >>>> HA-Proxy version 1.8.19-1ppa1~trusty 2019/02/12
> >>>> Copyright 2000-2019 Willy Tarreau [email protected]
> >>>> Build options :
> >>>> TARGET = linux2628
> >>>> CPU = generic
> >>>> CC = gcc
> >>>> CFLAGS = -O2 -g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4
> -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fno-strict-aliasing
> -Wdeclaration-after-statement -fwrapv -Wno-unused-label
> >>>> OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1
> USE_LUA=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_NS=1
> >>>> Default settings :
> >>>> maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents =
> 200
> >>>> Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
> >>>> Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
> >>>> OpenSSL library supports TLS extensions : yes
> >>>> OpenSSL library supports SNI : yes
> >>>> OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
> >>>> Built with Lua version : Lua 5.3.1
> >>>> Built with transparent proxy support using: IP_TRANSPARENT
> IPV6_TRANSPARENT IP_FREEBIND
> >>>> Encrypted password support via crypt(3): yes
> >>>> Built with multi-threading support.
> >>>> Built with PCRE version : 8.31 2012-07-06
> >>>> Running on PCRE version : 8.31 2012-07-06
> >>>> PCRE library supports JIT : no (libpcre build without JIT?)
> >>>> Built with zlib version : 1.2.8
> >>>> Running on zlib version : 1.2.8
> >>>> Compression algorithms supported : identity("identity"),
> deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
> >>>> Built with network namespace support.
> >>>> Available polling systems :
> >>>> epoll : pref=300, test result OK
> >>>> poll : pref=200, test result OK
> >>>> select : pref=150, test result OK
> >>>> Total: 3 (3 usable), will use epoll.
> >>>> Available filters :
> >>>> [SPOE] spoe
> >>>> [COMP] compression
> >>>> [TRACE] trace
> >>>> --
> >>>> Daniel Schneller
> >>>> Principal Cloud Engineer
> >>>> CenterDevice GmbH
> >>>> Rheinwerkallee 3
> >>>> 53227 Bonn
> >>>> www.centerdevice.com
> >>>>
> >>>> Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina, Michael
> Rosbach, Handelsregister-Nr.: HRB 18655, HR-Gericht: Bonn, USt-IdNr.:
> DE-815299431
> >>>> Diese E-Mail einschließlich evtl. beigefügter Dateien enthält
> vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht
> der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
> informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und
> evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder
> Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser
> E-Mail ist nicht gestattet.
> >>
> >
>
>

Re: DNS Resolver Issues

Reply via email to