CCing the list.. sorry.

Eric Graham
DevOps Specialist
Direct: 605.990.1859
[email protected]<mailto:[email protected]>
[cid:16f20d06-c1b9-49c2-80f4-06819b01d04a]
________________________________
From: Eric Graham <[email protected]>
Sent: Wednesday, January 4, 2023 4:13 PM
To: Kevin P. Fleming <[email protected]>
Subject: Re: [Kea-users] Load-Balancing Network issue between Relay and Kea

You're right. There's a table of values against which the DUID (if IPv6) is 
hashed. The result % number of servers is used as an index pointing to the 
server that will process the packet.

https://gitlab.isc.org/isc-projects/kea/-/blob/46dc8d276efda1a240f0c05580bdcba62ae5a6c7/src/hooks/dhcp/high_availability/query_filter.cc#L416-L446

Even though the Kea load balancing algorithm (as well as the DHCPd load 
balancing algorithm) is not exactly RFC compliant, this part seems to be. See 
RFC 3074 ยง 6.

I have encountered this same issue when one server cannot communicate. For me, 
it was partially caused by my socket type being wrong. However, I found the 
load balancing behavior to be sufficiently finnicky that I have standardized on 
hot-standby. With the size deployments I deal with, load balancing provides 
marginal performance improvement at the cost of issues like this and more 
complicated configuration.

Additionally, having a RADIUS backend made this issue even worse. Load 
balancing + RADIUS = a bad time.

Eric Graham
DevOps Specialist
Direct: 605.990.1859
[email protected]<mailto:[email protected]>
[cid:611bb96b-af22-42ae-9890-37d6469ab42b]
________________________________
From: Kea-users <[email protected]> on behalf of Kevin P. Fleming 
<[email protected]>
Sent: Wednesday, January 4, 2023 3:59 PM
To: [email protected] <[email protected]>
Subject: Re: [Kea-users] Load-Balancing Network issue between Relay and Kea

CAUTION: This email originated outside the organization. Do not click any links 
or attachments unless you have verified the sender.

On Wed, Jan 4, 2023, at 15:54, Simon wrote:

> Kevin P. Fleming <[email protected]> wrote:
>
>> If 'max-unacked-clients' isn't sufficient to address this, then this leaves 
>> a fairly large opening in the Kea high-availability story, as any network 
>> disruption which causes a server to no longer receive discovery packets from 
>> clients, but otherwise receives all expected network traffic, won't be 
>> noticed except by the clients! This concerns me, as (like other users here) 
>> my Kea servers receive all client traffic via DHCP relays, and 
>> misconfiguration of the relay such that it only relays to one server and not 
>> both will result in half of my clients not getting DHCP service at all.
>
> Surely, if you misconfigure a relay agent in that way, around half your
> clients will initially be unable to renew their leases, but eventually
> will get serviced by the available server once their active lease has
> expired ? That would mean the clients would drop their network config
> momentarily before setting up a new one - meaning that active
> connections would drop, but new ones would connect just fine once the
> new settings are in place.

That's why I posted; I don't really know!

If the server receiving the client requests is not in partner-down state, based 
on my understanding of the Kea ARM section on HA it will not respond to those 
requests. That certainly seems to be the case while the lease is still active; 
once the lease has expired I'm not sure what will happen.

In my network with Kea in load-balancing mode, there seems to be some sort of 
algorithm involved even for DHCP DISCOVER, where only one of the two servers 
responds with DHCP OFFER even though they are both running in a normal state. 
It has been my assumption (untested) up to this point that Kea is using the 
client's identifier (MAC address, DUID, etc.) to choose one or the other of the 
active servers to respond to that DISCOVER. If that's true, and both servers 
are in normal operation (neither is in partner-down), then that algorithm would 
continue telling the second server to *not* respond to requests from that 
client because it believes the other server will do so... even if the other 
server is not receiving the client's requests.

To summarize, that's what I assumed (against untested) 'max-unacked-clients' is 
for; if the second server assumes the first server will respond to those 
clients, but it does not (no leases are offered to them), it could notice the 
situation and decide that the first server is unhealthy or partitioned and 
force it into a 'down' state.
--
ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.

To unsubscribe visit https://lists.isc.org/mailman/listinfo/kea-users.

Kea-users mailing list
[email protected]
https://lists.isc.org/mailman/listinfo/kea-users
-- 
ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.

To unsubscribe visit https://lists.isc.org/mailman/listinfo/kea-users.

Kea-users mailing list
[email protected]
https://lists.isc.org/mailman/listinfo/kea-users

Reply via email to