Consul template is something done by consul itself, after that haproxy.conf is rendered
Do you mean "how haproxy deals with rendered template"? On Fri, Apr 5, 2024, 15:02 Andrii Ustymenko <andrii.ustyme...@adyen.com> wrote: > Dear list! > > My name is Andrii. I work for Adyen. We are using haproxy as our main > software loadbalancer at quite large scale. > Af of now our main use-case for backends routing based on > server-template and dynamic dns from consul as service discovery. Below > is the example of simple backend configuration: > > ``` > backend example > balance roundrobin > server-template application 10 _application._tcp.consul resolvers > someresolvers init-addr last,libc,none resolve-opts allow-dup-ip > resolve-prefer ipv4 check ssl verify none > ``` > > and in global configuration > > ``` > resolvers someresolvers > nameserver ns1 10.10.10.10:53 > nameserver ns2 10.10.10.11:53 > ``` > > As we see haproxy will create internal table for backends with some > be_id and be_name=application and allocate 10 records for each server > with se_id from 1 to 10. Then those records get populated and updated > with the data from resolvers. > I would like to understand couple of things with regards to this > structure and how it works, which I could not figure out myself from the > source code: > > 1) In tcpdump for dns queries we see that haproxy asynchronously polls > all the nameservers simultaneously. For instance: > > ``` > 11:06:17.587798 eth2 Out ifindex 4 aa:aa:aa:aa:aa:aa ethertype IPv4 > (0x0800), length 108: 10.10.10.50.24050 > 10.10.10.10.53: 34307+ [1au] > SRV? _application._tcp.consul. (60) > 11:06:17.587802 eth2 Out ifindex 4 aa:aa:aa:aa:aa:aa ethertype IPv4 > (0x0800), length 108: 10.10.10.50.63155 > 10.10.10.11.53: 34307+ [1au] > SRV? _application._tcp.consul. (60) > 11:06:17.588097 eth2 In ifindex 4 ff:ff:ff:ff:ff:ff ethertype IPv4 > (0x0800), length 205: 10.10.10.10.53 > 10.10.10.50.24050: 2194 2/0/1 SRV > 0a5099e5.addr.consul.:25340 1 1, SRV 0a509934.addr.consul.:26010 1 1 (157) > 11:06:17.588097 eth2 In ifindex 4 ff:ff:ff:ff:ff:ff ethertype IPv4 > (0x0800), length 205: 10.10.10.11.53 > 10.10.10.50.63155: 2194 2/0/1 SRV > 0a5099e5.addr.consul.:25340 1 1, SRV 0a509934.addr.consul.:26010 1 1 (157) > ``` > > Both nameservers reply with the same response. But what if they are out > of sync? Let's say one says: server1, server2 and the second one says > server2, server3? So far testing this locally - I see sometimes the > reply overrides the table, but sometimes it seems to just gets merged > with the rest. > > 2) Each entry from SRV reply will be placed into the table under > specific se_id. Most of the times that placement won't change. So, for > the example above the most likely 0a5099e5.addr.consul. and > 0a509934.addr.consul. will have se_id 1 and 2 respectively. However > sometimes we have the following scenario: > > 1. We admistratively disable the server (drain traffic) with the next > command: > > ``` > echo "set server example/application1 state maint" | nc -U > /var/lib/haproxy/stats > ``` > > the MAINT flag will be added to the record with se_id 1 > > 2. Instance of application goes down and gets de-registered from consul, > so also evicted from srv replies and out of discovery of haproxy. > > 3. Instance of application goes up and gets registered by consul and > discovered by haproxy, but haproxy allocates different se_id for it. > Haproxy healthchecks will control the traffic to it in this case. > > 4. We will still have se_id 1 with MAINT flag and application instance > dns record placed into different se_id. > > The problem comes that any new discovered record which get placed into > se_id 1 will never be active until either command: > > ``` > echo "set server example/application1 state ready" | nc -U > /var/lib/haproxy/stats > ``` > > executed or haproxy gets reloaded without state file. With this pattern > we basically have persistent "records pollution" due to operations made > directly with control socket. > > I am not sure is there anything to do about this. Maybe, if haproxy > could cache the state not only of se_id but also associated record with > that and then if that gets changed - re-schedule healtchecks. Or instead > of integer ids use some hashed ids based on dns/ip-addresses of > discovered records, in this case binding will happen exactly in the same > slot. > > Thanks in advance! > > -- > > Best regards, > > Andrii Ustymenko > > >