Prior to discussion on this thread, I was under impression that Kea HA would 
failover (reach  state of partner-down ) any time the number of 
max-unacked-clients was exceeded.  As pointed out by others, found in testing 
that this will not occur if servers are able to successfully communicate to 
each other but clients are unable to reach one of the servers.  This scenario 
can occur anytime there is network disruption between clients and one of the 
Kea servers (or the primary server in case of hot-standby).

The problem in this scenario exists in both load-balancing and hot-standby HA 
configuration.  In a load-balancing configuration, aprox half of clients will 
be serviced if clients are only able to connect to one server in the HA pair.  
In a hot-standby configuration, none of the clients will be serviced by the 
standby server if unable to connect to the primary server.

While testing this scenario, found that status of the servers shows they do not 
appear to check/track un-acked clients unless communication to partner is 
failed (“communication-interrupted” is true).  The counters for both 
unacked-clients and unacked-clients-left are always 0 regardless of “secs” 
field in DHCP request unless communication-interrupted is true.  The  
unacked-clients and unacked-clients-left counters are not used unless/until 
communication is interrupted.

I can appreciate difficulty in determining logic that results in current 
behavior as there are challenges in detecting partner-down without resulting in 
a split brain situation.  Wondering if Kea “partner-down” logic can be improved 
by assessing data sent in heartbeat/sync.  I.e. – if server not seeing 
updates/leases from partner for same DHCP requests that are ignored (and “secs” 
exceeded) due to server assuming partner is servicing, that status of one or 
both servers could be changed.

Example:
server1 not receiving DHCP requests from clients but is communicating with 
server2.  server2 is receiving DHCP requests from all clients but ignoring some 
requests due to client should be serviced by server1 (via internal algorithm of 
client ID).
If server2 sees that server1 is not sending any updates/leases for said client 
requests, server1 is put into state that allows server2 to service requests.  
Tricky part would be in determining if/when server1 should auto change state if 
begins to see client requests.   Perhaps an option to put node in maintenance 
mode that requires manually enabling server1?


Sensitivity: Internal
-- 
ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.

To unsubscribe visit https://lists.isc.org/mailman/listinfo/kea-users.

Kea-users mailing list
[email protected]
https://lists.isc.org/mailman/listinfo/kea-users

Reply via email to