Hi all

[I am reluctant to send this, as it could very well be a stupid
 idea. But as at least one person has suggested I discuss it on DNSOP,
 so here it is.]

When a server (plain or EDNS capable) is queried via UDP, and determines
that the response won't fit into 512 (plain) or the client's UDP message
size (EDNS), it sets TC=1 forcing the client to retry via TCP.

Fragmentation at the IP layer causes issues. Fragmentation could occur
when the PMTU is lower than the advertised EDNS message size. IP
fragments may be dropped by devices on the path causing the UDP datagram
to not arrive at the user application. Packets with DF=1 also are not
fragmented by a router if it cannot forward it.

With EDNS, when the client message size is small, a response may still
not fit in a single datagram causing the client to retry using TCP.

----

Can we have the following scheme so that fragmentation is supported at
the application level?

When a server determines that the response doesn't fit into a single
datagram (512 or the client's message size), the server splits the reply
into multiple fragment datagrams (512 or some discovered PMTU that
works) such that:

1. Each datagram is a DNS reply message with identical header field
values (except for section counts) and TC=1 in each of them. The ID
field has the same value among all reply fragments.

2. Each datagram contains part of the RRs that form the complete reply,
split on RR boundaries. The DNS header contains the appropriate section
counts for that datagram. The datagrams need not be equal in size.

3. An additional RR (plain DNS) or pseudo RR (inside OPT) called
FRAGMENT is present in every datagram with 2 16-bit fields containing
the count of fragments, and current fragment. (Though a DNS message is
limited to 1<<16 octets and a DNS datagram can be at least 512 octets
long, 16-bit fields are better for fragment count as the datagrams can
be of different sizes.)

4. A client that doesn't know about this scheme notices TC=1 and retries
with TCP. Datagrams other than the first one should be ignored as they
are duplicate replies with the same message ID.

5. A client that is aware of this scheme finds TC=1 and the FRAGMENT RR
and does reassembly (similar to IP fragment reassembly such as RFC 815),
DNS messages being limited to 1<<16 octets too.

This scheme still restricts the size of a single RR to the datagram
size. Reassembly (unlike IP fragments) doesn't require offsets such as
used in RFC 815 as RRs are wholly contained inside one datagram.

TSIG can also be made to work with such a scheme on fragment by fragment
basis.

----

This scheme is not for replacing TCP. As mentioned above, if a TXT RR
containing multiple character-strings doesn't fit in a single datagram
for example, and truncation happens, it'll require TCP. It's not for
replacing EDNS's large datagram sizes too. But it is possible for EDNS
replies to overflow path MTU causing loss of replies, and when loss is
noted, on second attempt, truncation could occur as the message no
longer fits in reduced datagram size.

Some things can still be served by UDP where possible (without involving
all the baggage of TCP.. roundtrips for starting SYN/ACK, for most DNS
requests having the connection remain in slow-start phase, etc.)  As an
example, with a fragment datagram max size of 512, replies could
traverse a firewall that blocked large replies.

This scheme should be backwards compatible with (ignored by) existing
implementations. Client implementations of this scheme can also signal
support with FRAGMENT 0 0.

                Mukund

Attachment: pgpGaLOomWWPP.pgp
Description: PGP signature

_______________________________________________
DNSOP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to