Re: [dns-wg] Draft of RIPE DNS Resolver Best Common Practices

Petr Špaček Wed, 29 Nov 2023 02:13:19 -0800

Hello everyone,

thank your for hard work on this. I think it's well written document.


More substantial feedback below relates to:
- TTL recommendations
- Selection of transport protocols
- EDNS Client Subnet (ECS)
- Missing mention of RFC 8906

More in-line below, including couple nits.


On 26. 11. 23 18:01, Shane Kerr wrote:

#### System Diversity[typo]


> and my sometimes be
> hidden.

## DNS configuration knobs
### DNSSEC validation

[RFC9364](https://www.rfc-editor.org/rfc/rfc9364.html) provides a lot
of useful information, and links to further documents about DNSSEC.
However, operators usually do not need to know the details, and can
simply ensure that DNSSEC validation is enabled in their software;
this is usually enabled by default.


[Nit]

Oh I wish. E.g. PowerDNS does not enable validation by default andthat's not a small player. I propose to remove the "; this is usuallyenabled by default." part as it might lead to sloppiness and IMHO doesnot really bring much.

### DNS Transport Protocols

**UDP and TCP must be supported.**

For: ALL DNS resolver operators.


I like the capital ALL :-)

UDP is what most clients use, and TCP is necessary for DNS answers
that are too large for a single UDP packet.


[Nit]

Maybe mention that UDP over 512 bytes is also okay? Check for stupidfirewalls or something? Or maybe that's going into too much detail, Idon't know.

### Packet Fragmentation Avoidance

**Servers should be configured to avoid fragmentation.**

For: ALL DNS resolver operators.

Packet fragmentation can cause issues with DNS over UDP, especially
over IPv6. These issues can be minimized by choosing implementations
that set IP options to avoid this, and by taking care with EDNS0
message sizes.

Recommendations are available in
[draft-ietf-dnsop-avoid-fragmentation](https://datatracker.ietf.org/doc/draft-ietf-dnsop-avoid-fragmentation/).


[Nit]

I think linking to URLhttps://datatracker.ietf.org/doc/html/draft-ietf-dnsop-avoid-fragmentationis better as it should point to the latest version of the document, evenafter it becomes an RFC.

### Encrypted DNS

**DNS-over-TLS (DoT), DNS-over-HTTPS (DoH), and DNS-over-QUIC (DoQ)
should be supported.**

For: All DNS resolver operators.

DoT, DoH, and DoQ are different technologies that all provide an
encrypted channel between the resolver and the authoritative server.
DoT is the oldest, and provides encrypted DNS using TLS. DoH uses HTTP
over TLS as a way to transmit queries and answers, and is widely
supported by web browsers. DoQ is the newest, and provides advanced
features such as separate streams for each query, avoiding the "head
of line" blocking problem common with all protocols layered on top of
TCP (such as DoT and DoH).

- DoT
   - [RFC7858](https://www.rfc-editor.org/rfc/rfc7858.html)
- DoH
   - [RFC8484](https://www.rfc-editor.org/rfc/rfc8484.html)
- DoQ
   - [RFC9250](https://www.rfc-editor.org/rfc/rfc9250.html)


[Substantial]

This recommendation says "increase attack surface 3x". Each extraprotocol comes with its operational cost, and I think supporting all ofthem without reason and sufficient knowledge is asking for trouble.

Operators will need to know how to debug not only DNS, but alsoTLS+HTTP/2 combo (or QUIC) and none of this is for fainthearted -especially when under DDoS.

My personal recommendation would be - pick smallest set supported bytarget population. Bonus points for DoT because it by far easiest tounderstand and debug when something goes wrong.

If the WG thinks supporting all of this protocol circus _at once_ is thebest recommendation then I think this recommendation deserves a warningthat operators need to do their homework first, understand how to debugindividual pieces, and be prepared to handle protocol-specific DoS attacks.

### Aggressive NSEC cachingI agree with this paragraph and disagree with disagreements elsewhere on

the mailing list :-)

### Local Root

**Local root should be used.**

For: Public resolver operators.

Since the root zone is DNSSEC signed,


^^^ Something is missing here, I guess?

Running a local root has several benefits, but it is an additional
component to maintain. For public resolver operators this is
definitely worth the cost, but other resolver operators may choose to
simply send all queries to the well-distributed root name servers.


[comment, no text change proposed]

With proper monitoring in place, sure, but it's not really buying muchif aggressive caching is enabled. With RFC 8198 you get benefits oflocal root within seconds and with less operational complexity andfragility.

### TTL Recommendations

**TTL limits may be adjusted.**

For: All DNS resolver operators.

Software typically defaults to a maximum stored TTL of 1 or 2 days.
This may be lowered to reduce the cache size. A lower TTL will mean
removing rarely-used records that have long TTL, and should not have
much operational impact from a CPU or network point of view, but may
save memory.


[Substantial]

This section seems *entirely* incorrect to me. Cache needs some sort oflimit on its size anyway, regardless of TTL limits. Artificiallylimiting TTLs is entirely ineffective as a method to limit cache size inmany scenarios - e.g. when under random subdomain attack. A proper cachecleaning algorithm should take care of evicting least used records, andno TTL limits are needed.

I think this section should discuss impact of very long TTLs onavailability when someone messes up things on the auth side (slack.comDS, anyone?), or when the auth side is under attack.

https://ant.isi.edu/~johnh/PAPERS/Moura19b.html is an excellent resourcepossibly worth linking to.

It is possible to set a minimum TTL in many implementations. This is a
violation of the DNS protocol, although may be useful to reduce load
from records with very low TTL (less than 5 seconds).


[nit]

I argue that setting lower bound higher than ~ seconds is antisocial andasking for operational trouble. On the other hand TTL=0 is antisocialfrom the auth side and should be outlawed :-) Personally I would be finewith recommending minimum TTL=1 second.

### EDNS Client Subnet (ECS)

**ECS may be enabled.**


[Substantial]
Can we say something like
**ECS may be enabled if careful evaluation indicates it is beneficial.**
?


For: All DNS resolver operators.

EDNS Client Subnet (ECS) allows the resolver to include information
about the IP address of the client querying it when sending messages
to authoritative servers. This may allow authoritative servers to
provide different answers which are more appropriate for the client.
However, ECS will increase the amount of cache space required by
resolvers, may reduce DNS performance, and may have privacy
implications.

It most certainly _will_ (not may) reduce DNS performance. But itreportedly increases non-DNS performance in certain scenarios :shrug:


[Substantial]
At this point in text I would like to prepend a sentence like this:

"A resolver operator whose clients share single network path to theInternet will see no benefit at all."

A resolver operator that has clients that are limited to a specific
region may see no benefit. A resolver operator that has a widely
distributed anycast network may not have much benefit from ECS, since
the locations that initiate the query will be close to the client. But
a resolver operator that answers client queries only from a few
locations, and expects clients to come from a wide area, may provide
better service for end-users by supporting ECS.

EDNS client subnet is described in
[RFC7871](https://www.rfc-editor.org/rfc/rfc7871.html), an
informational RFC.

### Trust Anchor Reporting

**Trust anchor reporting may be enabled.**

[nit]

I would say it should be enabled. It costs almost nothing and hasnegligible risks for anyone.


-----------

Now the hard part - missing pieces.

This one hard to put under existing headings. Should it be under ###Software considerations or ### Networking considerations or elsewhere?


Anyway: RFC 8906
A Common Operational Problem in DNS Servers: Failure to Communicate
https://datatracker.ietf.org/doc/html/rfc8906

If an operator puts a "security appliance" in front of the DNS server toincrease its purported "security" it messes up the protocol and breaksthings. Most importantly, failure to respond to _ALL_ queries (becausethe "appliance thinks some queries are not safe or needed") leads toexploitable protocol-level issues. See e.g. paper about trouble inresolver-auth transactions:

Silence is not Golden: Disrupting the Load Balancing of AuthoritativeDNS Servers

https://indico.dns-oarc.net/event/47/contributions/1018/

Similarly, non-response to stub clients _also_ creates problems becausestubs are notoriously bad at handling retransmissions in timely manner etc.

I think this is worth calling out. "Even if your server is not going toanswer a query, send back at least RCODE REFUSED." or something like that.



Congratulations if you made it this far - and thank you for your time!

--
Petr Špaček
Internet Systems Consortium

--

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/dns-wg

Re: [dns-wg] Draft of RIPE DNS Resolver Best Common Practices

Reply via email to