On Wed, Jul 8, 2020 at 10:01 AM <[email protected]> wrote: > > DNSOP WG, > > Paul Vixie and I submitted draft-ietf-dnsop-avoid-fragmentation-00. > Please review it.
Hi! > UDP requestors and responders SHOULD send DNS responses with > IP_DONTFRAG / IPV6_DONTFRAG [RFC3542] options, which will yield > either a silent timeout, or a network (ICMP) error, if the path > MTU is exceeded. When MTU is exceeded the sender might also receive plain old EMSGSIZE error on sendto(). I would love to see an example on what IP_MTU_DISCOVER settings authors expect. This option is notoriously hard to get right. > The maximum buffer size offered by an EDNS0 initiator SHOULD be > no larger than the estimated maximum DNS/UDP payload size... This seems to indicate that EDNS0 over TCP should have a small buffer size as well. Consider wording like "...buffer size offered by an EDNS0 initator over UDP...". > Fragmented DNS/UDP messages may be dropped without IP reassembly Not sure what it has to do with the draft. Are we worried about request fragmentation and allowing the DNS server to drop fragmented requests? Are we worried about response fragmentation? I have two problems with this proposal. First, it doesn't mention IPv4 vs IPv6 differences at all. In IPv4 landscape fragmentation, while a security issue, is generally fine. In the IPv6 world, fragmentation is disastrous - packets with extension headers are known to be dropped. Second, this proposal assumes that path MTU detection works correctly. This is surprisingly optimistic. Let's consider IPv6 - in IPv6 the smaller path MTU < 1500 is very common. Let's say a DNS auth server sent an IPv6 DNS response packet exceeding path MTU. An intermediate router will drop the offending packet and one of three scenarios will happen: - (A) No ICMP PTB message is sent back. - (B) ICMP PTB message is sent back, but fails to be delivered. - (C) ICMP PTB message is sent back and delivered correctly to the server. All three scenarios are disastrous on the practical internet. The proposal assumes (A) and (B) will rarely happen, and puts the responsibility on the DNS client to retry over TCP. This will cause unnecessary timeouts and degrade the overall quality of the service. But perhaps most importantly even option (C) will *not* result in good service. Consider a setup with multiple DNS servers behind an ECMP router, or another L4 load balancer. Even if the return ICMP will hit back the correct server - which is far from obvious - the ICMP will update the Path MTU on *one server*. If a client attempts to retry the query, as suggested by the proposal, it will most likely hit another server, which is not aware of non standard Path MTU. These days DNS Auth installations use ECMP routing for load balancing. A single physical box serving important DNS is a rare occurrence. In this proposal all three (A), (B), and (C) scenarios will result in dropped responses. DNS client needs to wait for timeout, retry over UDP, wait more and eventually retry over TCP. This is bad. We could fix (C) by making the DNS server to capture the ICMP PTB in DNS server code. The ICMP payload often has enough context for the DNS server to prepare another reply. This reply of course should be sent with lowered MTU. In other words, I'm asking for capturing ICMP PTB in DNS servers, unpacking the paylad and treating it as another request to be handled. This would solve (C). Also, note this opens an interesting DDoS vector, but this is another story. On Linux it is possible to capture the ICMP PTB without privileges, by setting IP_RECVERR and inspecting MSG_ERRQUEUE. In IPv4 the PTB messages often have 520 bytes of payload and in IPv6 1184 bytes. This is enough context to build another response, without having to wait for any timeout. Cheers, Marek _______________________________________________ DNSOP mailing list [email protected] https://www.ietf.org/mailman/listinfo/dnsop
