Hi,

A couple of patches for the DNS runtime resolver:
#1 is just a typo cleanup
#2 fixes a "regression" introduced with the parsing of the Additional
section from the SRV record responses. Basically, when HAProxy uses SRV
records and Additional sections together, a server may not recover from its
MAINT status after a scaled down and scaled up operation sequence.
This can lead to all server going in MAINT in a backend.

Both should be backported to 2.2.

Baptiste
From 04e6e0941f1e84ca3d41dfac00cd253c010a9422 Mon Sep 17 00:00:00 2001
From: Baptiste Assmann <bed...@gmail.com>
Date: Tue, 4 Aug 2020 10:54:14 +0200
Subject: [PATCH 1/2] CLEANUP: dns: typo in reported error message

"record" instead of "recrd".

This should be backported to 2.2.
---
 src/dns.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/dns.c b/src/dns.c
index c8f34874d..c97c7dc69 100644
--- a/src/dns.c
+++ b/src/dns.c
@@ -634,10 +634,10 @@ static void dns_check_dns_response(struct dns_resolution *res)
 
 					switch (item->ar_item->type) {
 						case DNS_RTYPE_A:
-							update_server_addr(srv, &(((struct sockaddr_in*)&item->ar_item->address)->sin_addr), AF_INET, "DNS additional recrd");
+							update_server_addr(srv, &(((struct sockaddr_in*)&item->ar_item->address)->sin_addr), AF_INET, "DNS additional record");
 						break;
 						case DNS_RTYPE_AAAA:
-							update_server_addr(srv, &(((struct sockaddr_in6*)&item->ar_item->address)->sin6_addr), AF_INET6, "DNS additional recrd");
+							update_server_addr(srv, &(((struct sockaddr_in6*)&item->ar_item->address)->sin6_addr), AF_INET6, "DNS additional record");
 						break;
 					}
 
-- 
2.17.1

From 3ec65443136714e4549886e5d970b47f0f52b41c Mon Sep 17 00:00:00 2001
From: Baptiste Assmann <bed...@gmail.com>
Date: Tue, 4 Aug 2020 10:57:21 +0200
Subject: [PATCH 2/2] MAJOR: dns: disabled servers through SRV records never
 recover

A regression was introduced by 13a9232ebc63fdf357ffcf4fa7a1a5e77a1eac2b
when I added support for Additional section of the SRV responses..

Basically, when a server is managed through SRV records additional
section and it's disabled (because its associated Additional record has
disappeared), it never leaves its MAINT state and so never comes back to
production.
This patch updates the "snr_update_srv_status()" function to clear the
MAINT status when the server now has an IP address and also ensure this
function is called when parsing Additional records (and associating them
to new servers).

This can cause severe outage for people using HAProxy + consul (or any
other service registry) through DNS service discovery).

This should fix issue #793.
This should be backported to 2.2.
---
 src/dns.c    | 3 +++
 src/server.c | 6 ++++++
 2 files changed, 9 insertions(+)

diff --git a/src/dns.c b/src/dns.c
index c97c7dc69..333780293 100644
--- a/src/dns.c
+++ b/src/dns.c
@@ -648,6 +648,9 @@ static void dns_check_dns_response(struct dns_resolution *res)
 				if (msg)
 					send_log(srv->proxy, LOG_NOTICE, "%s", msg);
 
+				/* now we have an IP address associated to this server, we can update its status */
+				snr_update_srv_status(srv, 0);
+
 				srv->svc_port = item->port;
 				srv->flags   &= ~SRV_F_MAPPORTS;
 				if ((srv->check.state & CHK_ST_CONFIGURED) &&
diff --git a/src/server.c b/src/server.c
index 918294b2f..3f26104cc 100644
--- a/src/server.c
+++ b/src/server.c
@@ -3733,6 +3733,12 @@ int snr_update_srv_status(struct server *s, int has_no_ip)
 
 	/* If resolution is NULL we're dealing with SRV records Additional records */
 	if (resolution == NULL) {
+		/* since this server has an IP, it can go back in production */
+		if (has_no_ip == 0) {
+			srv_clr_admin_flag(s, SRV_ADMF_RMAINT);
+			return 1;
+		}
+
 		if (s->next_admin & SRV_ADMF_RMAINT)
 			return 1;
 
-- 
2.17.1

Reply via email to