Yesterday we had an ELB scale to 26 IP addresses, at which time ALL of the servers in that backend were suddenly marked down, e.g. :
Server www26 is going DOWN for maintenance (unspecified DNS error) Ergo, ALL requests to that backend got 503s ==> complete outage Mayhap src/dns.c::dns_validate_dns_response() bravely running away when DNS_RESP_TRUNCATED (skipping parsing of the partial list of servers, abandoning TTL updates to perfectly good endpoints) is not the best course of action ? Of course we'll hope (MTUs allowing) that we'll be able to paper this over for awhile using an accepted_payload_size >default(512). But as-is, this looks to be an avoidable pathology? Thoughts? Yours, endlessly impressed with haproxy, ...jfree https://packages.debian.org/stretch-backports/haproxy 1.8.19-1~bpo9+1