Hello everyone,

thank your for hard work on this. I think it's well written document.

More substantial feedback below relates to:
- TTL recommendations
- Selection of transport protocols
- EDNS Client Subnet (ECS)
- Missing mention of RFC 8906

More in-line below, including couple nits.


On 26. 11. 23 18:01, Shane Kerr wrote:
#### System Diversity[typo]

> and my sometimes be
> hidden.


## DNS configuration knobs
### DNSSEC validation

[RFC9364](https://www.rfc-editor.org/rfc/rfc9364.html) provides a lot
of useful information, and links to further documents about DNSSEC.
However, operators usually do not need to know the details, and can
simply ensure that DNSSEC validation is enabled in their software;
this is usually enabled by default.

[Nit]
Oh I wish. E.g. PowerDNS does not enable validation by default and that's not a small player. I propose to remove the "; this is usually enabled by default." part as it might lead to sloppiness and IMHO does not really bring much.

### DNS Transport Protocols

**UDP and TCP must be supported.**

For: ALL DNS resolver operators.

I like the capital ALL :-)

UDP is what most clients use, and TCP is necessary for DNS answers
that are too large for a single UDP packet.

[Nit]
Maybe mention that UDP over 512 bytes is also okay? Check for stupid firewalls or something? Or maybe that's going into too much detail, I don't know.

### Packet Fragmentation Avoidance

**Servers should be configured to avoid fragmentation.**

For: ALL DNS resolver operators.

Packet fragmentation can cause issues with DNS over UDP, especially
over IPv6. These issues can be minimized by choosing implementations
that set IP options to avoid this, and by taking care with EDNS0
message sizes.

Recommendations are available in
[draft-ietf-dnsop-avoid-fragmentation](https://datatracker.ietf.org/doc/draft-ietf-dnsop-avoid-fragmentation/).

[Nit]
I think linking to URL https://datatracker.ietf.org/doc/html/draft-ietf-dnsop-avoid-fragmentation is better as it should point to the latest version of the document, even after it becomes an RFC.

### Encrypted DNS

**DNS-over-TLS (DoT), DNS-over-HTTPS (DoH), and DNS-over-QUIC (DoQ)
should be supported.**

For: All DNS resolver operators.

DoT, DoH, and DoQ are different technologies that all provide an
encrypted channel between the resolver and the authoritative server.
DoT is the oldest, and provides encrypted DNS using TLS. DoH uses HTTP
over TLS as a way to transmit queries and answers, and is widely
supported by web browsers. DoQ is the newest, and provides advanced
features such as separate streams for each query, avoiding the "head
of line" blocking problem common with all protocols layered on top of
TCP (such as DoT and DoH).

- DoT
   - [RFC7858](https://www.rfc-editor.org/rfc/rfc7858.html)
- DoH
   - [RFC8484](https://www.rfc-editor.org/rfc/rfc8484.html)
- DoQ
   - [RFC9250](https://www.rfc-editor.org/rfc/rfc9250.html)

[Substantial]

This recommendation says "increase attack surface 3x". Each extra protocol comes with its operational cost, and I think supporting all of them without reason and sufficient knowledge is asking for trouble.

Operators will need to know how to debug not only DNS, but also TLS+HTTP/2 combo (or QUIC) and none of this is for fainthearted - especially when under DDoS.

My personal recommendation would be - pick smallest set supported by target population. Bonus points for DoT because it by far easiest to understand and debug when something goes wrong.

If the WG thinks supporting all of this protocol circus _at once_ is the best recommendation then I think this recommendation deserves a warning that operators need to do their homework first, understand how to debug individual pieces, and be prepared to handle protocol-specific DoS attacks.


### Aggressive NSEC cachingI agree with this paragraph and disagree with disagreements elsewhere on
the mailing list :-)


### Local Root

**Local root should be used.**

For: Public resolver operators.

Since the root zone is DNSSEC signed,

^^^ Something is missing here, I guess?


Running a local root has several benefits, but it is an additional
component to maintain. For public resolver operators this is
definitely worth the cost, but other resolver operators may choose to
simply send all queries to the well-distributed root name servers.

[comment, no text change proposed]
With proper monitoring in place, sure, but it's not really buying much if aggressive caching is enabled. With RFC 8198 you get benefits of local root within seconds and with less operational complexity and fragility.


### TTL Recommendations

**TTL limits may be adjusted.**

For: All DNS resolver operators.

Software typically defaults to a maximum stored TTL of 1 or 2 days.
This may be lowered to reduce the cache size. A lower TTL will mean
removing rarely-used records that have long TTL, and should not have
much operational impact from a CPU or network point of view, but may
save memory.

[Substantial]

This section seems *entirely* incorrect to me. Cache needs some sort of limit on its size anyway, regardless of TTL limits. Artificially limiting TTLs is entirely ineffective as a method to limit cache size in many scenarios - e.g. when under random subdomain attack. A proper cache cleaning algorithm should take care of evicting least used records, and no TTL limits are needed.

I think this section should discuss impact of very long TTLs on availability when someone messes up things on the auth side (slack.com DS, anyone?), or when the auth side is under attack.

https://ant.isi.edu/~johnh/PAPERS/Moura19b.html is an excellent resource possibly worth linking to.

It is possible to set a minimum TTL in many implementations. This is a
violation of the DNS protocol, although may be useful to reduce load
from records with very low TTL (less than 5 seconds).

[nit]
I argue that setting lower bound higher than ~ seconds is antisocial and asking for operational trouble. On the other hand TTL=0 is antisocial from the auth side and should be outlawed :-) Personally I would be fine with recommending minimum TTL=1 second.


### EDNS Client Subnet (ECS)

**ECS may be enabled.**

[Substantial]
Can we say something like
**ECS may be enabled if careful evaluation indicates it is beneficial.**
?


For: All DNS resolver operators.

EDNS Client Subnet (ECS) allows the resolver to include information
about the IP address of the client querying it when sending messages
to authoritative servers. This may allow authoritative servers to
provide different answers which are more appropriate for the client.
However, ECS will increase the amount of cache space required by
resolvers, may reduce DNS performance, and may have privacy
implications.

It most certainly _will_ (not may) reduce DNS performance. But it reportedly increases non-DNS performance in certain scenarios :shrug:

[Substantial]
At this point in text I would like to prepend a sentence like this:
"A resolver operator whose clients share single network path to the Internet will see no benefit at all."

A resolver operator that has clients that are limited to a specific
region may see no benefit. A resolver operator that has a widely
distributed anycast network may not have much benefit from ECS, since
the locations that initiate the query will be close to the client. But
a resolver operator that answers client queries only from a few
locations, and expects clients to come from a wide area, may provide
better service for end-users by supporting ECS.

EDNS client subnet is described in
[RFC7871](https://www.rfc-editor.org/rfc/rfc7871.html), an
informational RFC.


### Trust Anchor Reporting

**Trust anchor reporting may be enabled.**
[nit]
I would say it should be enabled. It costs almost nothing and has negligible risks for anyone.

-----------

Now the hard part - missing pieces.

This one hard to put under existing headings. Should it be under ### Software considerations or ### Networking considerations or elsewhere?

Anyway: RFC 8906
A Common Operational Problem in DNS Servers: Failure to Communicate
https://datatracker.ietf.org/doc/html/rfc8906

If an operator puts a "security appliance" in front of the DNS server to increase its purported "security" it messes up the protocol and breaks things. Most importantly, failure to respond to _ALL_ queries (because the "appliance thinks some queries are not safe or needed") leads to exploitable protocol-level issues. See e.g. paper about trouble in resolver-auth transactions:

Silence is not Golden: Disrupting the Load Balancing of Authoritative DNS Servers
https://indico.dns-oarc.net/event/47/contributions/1018/

Similarly, non-response to stub clients _also_ creates problems because stubs are notoriously bad at handling retransmissions in timely manner etc.

I think this is worth calling out. "Even if your server is not going to answer a query, send back at least RCODE REFUSED." or something like that.


Congratulations if you made it this far - and thank you for your time!

--
Petr Špaček
Internet Systems Consortium

--

To unsubscribe from this mailing list, get a password reminder, or change your 
subscription options, please visit: 
https://lists.ripe.net/mailman/listinfo/dns-wg

Reply via email to