I couldn't find this behavior described in the documentation, indeed.
I am not sure to what extent documentation should cover this. For us it
is important to understand how it works, so we can build more
predictable/reliable setup of the load-balancing layer.
Also, if the behavior has changed/is going to be changed in new
versions, maybe, indeed some words would be nice to have in the
documentation as well.
On 08/04/2024 10:45, Илья Шипицин wrote:
am I right that you consider that as a documentation bug ?
пн, 8 апр. 2024 г. в 10:44, Andrii Ustymenko <andrii.ustyme...@adyen.com>:
Yes, for the 1) question indeed.
Basically I have tested with local "out of sync" custom
nameserver. And I was observing some inconsistent results of the
backend-servers table. That led to this question.
Most of the time I was seeing the state of only from the local
nameserver. However sometimes I have seen the "merged" state where
all the replies existed together in the table.
It was also observed that amount of the requests made by haproxy
to all nameservers is the same even though the local one normally
replies faster.
And sorry, forgot to mention we are running haproxy version 2.8.7
On 08/04/2024 10:31, Илья Шипицин wrote:
and particularly your question is "does HAProxy merge all
responses or pick the first one or use some other approach" ?
пн, 8 апр. 2024 г. в 10:23, Andrii Ustymenko
<andrii.ustyme...@adyen.com>:
I guess indeed it is not a case of consul-template
specifically, but more of rendered templates and how haproxy
maintains it.
On 06/04/2024 20:15, Илья Шипицин wrote:
Consul template is something done by consul itself, after
that haproxy.conf is rendered
Do you mean "how haproxy deals with rendered template"?
On Fri, Apr 5, 2024, 15:02 Andrii Ustymenko
<andrii.ustyme...@adyen.com> wrote:
Dear list!
My name is Andrii. I work for Adyen. We are using
haproxy as our main
software loadbalancer at quite large scale.
Af of now our main use-case for backends routing based on
server-template and dynamic dns from consul as service
discovery. Below
is the example of simple backend configuration:
```
backend example
balance roundrobin
server-template application 10
_application._tcp.consul resolvers
someresolvers init-addr last,libc,none resolve-opts
allow-dup-ip
resolve-prefer ipv4 check ssl verify none
```
and in global configuration
```
resolvers someresolvers
nameserver ns1 10.10.10.10:53 <http://10.10.10.10:53>
nameserver ns2 10.10.10.11:53 <http://10.10.10.11:53>
```
As we see haproxy will create internal table for
backends with some
be_id and be_name=application and allocate 10 records
for each server
with se_id from 1 to 10. Then those records get
populated and updated
with the data from resolvers.
I would like to understand couple of things with regards
to this
structure and how it works, which I could not figure out
myself from the
source code:
1) In tcpdump for dns queries we see that haproxy
asynchronously polls
all the nameservers simultaneously. For instance:
```
11:06:17.587798 eth2 Out ifindex 4 aa:aa:aa:aa:aa:aa
ethertype IPv4
(0x0800), length 108: 10.10.10.50.24050 >
10.10.10.10.53: 34307+ [1au]
SRV? _application._tcp.consul. (60)
11:06:17.587802 eth2 Out ifindex 4 aa:aa:aa:aa:aa:aa
ethertype IPv4
(0x0800), length 108: 10.10.10.50.63155 >
10.10.10.11.53: 34307+ [1au]
SRV? _application._tcp.consul. (60)
11:06:17.588097 eth2 In ifindex 4 ff:ff:ff:ff:ff:ff
ethertype IPv4
(0x0800), length 205: 10.10.10.10.53 >
10.10.10.50.24050: 2194 2/0/1 SRV
0a5099e5.addr.consul.:25340 1 1, SRV
0a509934.addr.consul.:26010 1 1 (157)
11:06:17.588097 eth2 In ifindex 4 ff:ff:ff:ff:ff:ff
ethertype IPv4
(0x0800), length 205: 10.10.10.11.53 >
10.10.10.50.63155: 2194 2/0/1 SRV
0a5099e5.addr.consul.:25340 1 1, SRV
0a509934.addr.consul.:26010 1 1 (157)
```
Both nameservers reply with the same response. But what
if they are out
of sync? Let's say one says: server1, server2 and the
second one says
server2, server3? So far testing this locally - I see
sometimes the
reply overrides the table, but sometimes it seems to
just gets merged
with the rest.
2) Each entry from SRV reply will be placed into the
table under
specific se_id. Most of the times that placement won't
change. So, for
the example above the most likely 0a5099e5.addr.consul. and
0a509934.addr.consul. will have se_id 1 and 2
respectively. However
sometimes we have the following scenario:
1. We admistratively disable the server (drain traffic)
with the next
command:
```
echo "set server example/application1 state maint" | nc -U
/var/lib/haproxy/stats
```
the MAINT flag will be added to the record with se_id 1
2. Instance of application goes down and gets
de-registered from consul,
so also evicted from srv replies and out of discovery of
haproxy.
3. Instance of application goes up and gets registered
by consul and
discovered by haproxy, but haproxy allocates different
se_id for it.
Haproxy healthchecks will control the traffic to it in
this case.
4. We will still have se_id 1 with MAINT flag and
application instance
dns record placed into different se_id.
The problem comes that any new discovered record which
get placed into
se_id 1 will never be active until either command:
```
echo "set server example/application1 state ready" | nc -U
/var/lib/haproxy/stats
```
executed or haproxy gets reloaded without state file.
With this pattern
we basically have persistent "records pollution" due to
operations made
directly with control socket.
I am not sure is there anything to do about this. Maybe,
if haproxy
could cache the state not only of se_id but also
associated record with
that and then if that gets changed - re-schedule
healtchecks. Or instead
of integer ids use some hashed ids based on
dns/ip-addresses of
discovered records, in this case binding will happen
exactly in the same
slot.
Thanks in advance!
--
Best regards,
Andrii Ustymenko
--
Andrii Ustymenko
Platform Reliability Engineer
office +31 20 240 12 40
Adyen Headquarters
Simon Carmiggeltstraat 6-50, 5th floor
1011 DJ Amsterdam, The Netherlands
Adyen <https://www.adyen.com>
--
Andrii Ustymenko
Platform Reliability Engineer
office +31 20 240 12 40
Adyen Headquarters
Simon Carmiggeltstraat 6-50, 5th floor
1011 DJ Amsterdam, The Netherlands
Adyen <https://www.adyen.com>
--
Andrii Ustymenko
Platform Reliability Engineer
office +31 20 240 12 40
Adyen Headquarters
Simon Carmiggeltstraat 6-50, 5th floor
1011 DJ Amsterdam, The Netherlands
Adyen <https://www.adyen.com>