I had come to think that haproxy was not parsing a Truncate-flagged DNS
response that had usable entries in it.

After further investigation, tcpdump made clear that the truncated DNS
response enumerated *no* 'A' records, expecting the client to switch to TCP
for the query.
So we'll be looking at a safe non-default for accepted_payload_size to
address this issue in future.

Thanks to all,
...jfree

On Thu, Oct 24, 2019 at 2:29 PM Jim Freeman <sovr...@gmail.com> wrote:

> https://github.com/haproxy/haproxy/issues/341
>
> On Thu, Oct 24, 2019 at 11:44 AM Lukas Tribus <li...@ltri.eu> wrote:
>
>> Hello,
>>
>> On Thu, Oct 24, 2019 at 5:53 PM Jim Freeman <sovr...@gmail.com> wrote:
>> >
>> > Yesterday we had an ELB scale to 26 IP addresses, at which time ALL of
>> the servers in that backend were suddenly marked down, e.g. :
>> >
>> >    Server www26 is going DOWN for maintenance (unspecified DNS error)
>> >
>> > Ergo, ALL requests to that backend got 503s ==> complete outage
>> >
>> > Mayhap src/dns.c::dns_validate_dns_response() bravely running away when
>> DNS_RESP_TRUNCATED (skipping parsing of the partial list of servers,
>> abandoning TTL updates to perfectly good endpoints) is not the best course
>> of action ?
>> >
>> > Of course we'll hope (MTUs allowing) that we'll be able to paper this
>> over for awhile using an accepted_payload_size >default(512).
>>
>> I agree this is basically a ticking time-bomb for everyone not
>> thinking about the DNS payload size every single day.
>>
>> However we also need to make sure people will become aware of it when
>> they are hitting truncation size. This would have to be at least a
>> warning on critical syslog level.
>>
>>
>> Reliable DNS resolution for everyone without surprises will only
>> happen with TCP based DNS:
>> https://github.com/haproxy/haproxy/issues/185
>>
>> For the issue in question on the other hand: can you file a bug on github?
>>
>>
>>
>> Thanks,
>>
>> Lukas
>>
>

Reply via email to