I had come to think that haproxy was not parsing a Truncate-flagged DNS response that had usable entries in it.
After further investigation, tcpdump made clear that the truncated DNS response enumerated *no* 'A' records, expecting the client to switch to TCP for the query. So we'll be looking at a safe non-default for accepted_payload_size to address this issue in future. Thanks to all, ...jfree On Thu, Oct 24, 2019 at 2:29 PM Jim Freeman <sovr...@gmail.com> wrote: > https://github.com/haproxy/haproxy/issues/341 > > On Thu, Oct 24, 2019 at 11:44 AM Lukas Tribus <li...@ltri.eu> wrote: > >> Hello, >> >> On Thu, Oct 24, 2019 at 5:53 PM Jim Freeman <sovr...@gmail.com> wrote: >> > >> > Yesterday we had an ELB scale to 26 IP addresses, at which time ALL of >> the servers in that backend were suddenly marked down, e.g. : >> > >> > Server www26 is going DOWN for maintenance (unspecified DNS error) >> > >> > Ergo, ALL requests to that backend got 503s ==> complete outage >> > >> > Mayhap src/dns.c::dns_validate_dns_response() bravely running away when >> DNS_RESP_TRUNCATED (skipping parsing of the partial list of servers, >> abandoning TTL updates to perfectly good endpoints) is not the best course >> of action ? >> > >> > Of course we'll hope (MTUs allowing) that we'll be able to paper this >> over for awhile using an accepted_payload_size >default(512). >> >> I agree this is basically a ticking time-bomb for everyone not >> thinking about the DNS payload size every single day. >> >> However we also need to make sure people will become aware of it when >> they are hitting truncation size. This would have to be at least a >> warning on critical syslog level. >> >> >> Reliable DNS resolution for everyone without surprises will only >> happen with TCP based DNS: >> https://github.com/haproxy/haproxy/issues/185 >> >> For the issue in question on the other hand: can you file a bug on github? >> >> >> >> Thanks, >> >> Lukas >> >