ELB scaling => sudden backend tragedy

Jim Freeman Thu, 24 Oct 2019 08:53:54 -0700

Yesterday we had an ELB scale to 26 IP addresses, at which time ALL of the
servers in that backend were suddenly marked down, e.g. :


   Server www26 is going DOWN for maintenance (unspecified DNS error)

Ergo, ALL requests to that backend got 503s ==> complete outage

Mayhap src/dns.c::dns_validate_dns_response() bravely running away when
DNS_RESP_TRUNCATED (skipping parsing of the partial list of servers,
abandoning TTL updates to perfectly good endpoints) is not the best course
of action ?

Of course we'll hope (MTUs allowing) that we'll be able to paper this over
for awhile using an accepted_payload_size >default(512).

But as-is, this looks to be an avoidable pathology?

Thoughts?

Yours, endlessly impressed with haproxy,
...jfree

https://packages.debian.org/stretch-backports/haproxy
1.8.19-1~bpo9+1

ELB scaling => sudden backend tragedy

Reply via email to