Hello everyone,
thank your for hard work on this. I think it's well written document.
More substantial feedback below relates to:
- TTL recommendations
- Selection of transport protocols
- EDNS Client Subnet (ECS)
- Missing mention of RFC 8906
More in-line below, including couple nits.
On 26. 11. 23 18:01, Shane Kerr wrote:
#### System Diversity[typo]
> and my sometimes be
> hidden.
## DNS configuration knobs
### DNSSEC validation
[RFC9364](https://www.rfc-editor.org/rfc/rfc9364.html) provides a lot
of useful information, and links to further documents about DNSSEC.
However, operators usually do not need to know the details, and can
simply ensure that DNSSEC validation is enabled in their software;
this is usually enabled by default.
[Nit]
Oh I wish. E.g. PowerDNS does not enable validation by default and
that's not a small player. I propose to remove the "; this is usually
enabled by default." part as it might lead to sloppiness and IMHO does
not really bring much.
### DNS Transport Protocols
**UDP and TCP must be supported.**
For: ALL DNS resolver operators.
I like the capital ALL :-)
UDP is what most clients use, and TCP is necessary for DNS answers
that are too large for a single UDP packet.
[Nit]
Maybe mention that UDP over 512 bytes is also okay? Check for stupid
firewalls or something? Or maybe that's going into too much detail, I
don't know.
### Packet Fragmentation Avoidance
**Servers should be configured to avoid fragmentation.**
For: ALL DNS resolver operators.
Packet fragmentation can cause issues with DNS over UDP, especially
over IPv6. These issues can be minimized by choosing implementations
that set IP options to avoid this, and by taking care with EDNS0
message sizes.
Recommendations are available in
[draft-ietf-dnsop-avoid-fragmentation](https://datatracker.ietf.org/doc/draft-ietf-dnsop-avoid-fragmentation/).
[Nit]
I think linking to URL
https://datatracker.ietf.org/doc/html/draft-ietf-dnsop-avoid-fragmentation
is better as it should point to the latest version of the document, even
after it becomes an RFC.
### Encrypted DNS
**DNS-over-TLS (DoT), DNS-over-HTTPS (DoH), and DNS-over-QUIC (DoQ)
should be supported.**
For: All DNS resolver operators.
DoT, DoH, and DoQ are different technologies that all provide an
encrypted channel between the resolver and the authoritative server.
DoT is the oldest, and provides encrypted DNS using TLS. DoH uses HTTP
over TLS as a way to transmit queries and answers, and is widely
supported by web browsers. DoQ is the newest, and provides advanced
features such as separate streams for each query, avoiding the "head
of line" blocking problem common with all protocols layered on top of
TCP (such as DoT and DoH).
- DoT
- [RFC7858](https://www.rfc-editor.org/rfc/rfc7858.html)
- DoH
- [RFC8484](https://www.rfc-editor.org/rfc/rfc8484.html)
- DoQ
- [RFC9250](https://www.rfc-editor.org/rfc/rfc9250.html)
[Substantial]
This recommendation says "increase attack surface 3x". Each extra
protocol comes with its operational cost, and I think supporting all of
them without reason and sufficient knowledge is asking for trouble.
Operators will need to know how to debug not only DNS, but also
TLS+HTTP/2 combo (or QUIC) and none of this is for fainthearted -
especially when under DDoS.
My personal recommendation would be - pick smallest set supported by
target population. Bonus points for DoT because it by far easiest to
understand and debug when something goes wrong.
If the WG thinks supporting all of this protocol circus _at once_ is the
best recommendation then I think this recommendation deserves a warning
that operators need to do their homework first, understand how to debug
individual pieces, and be prepared to handle protocol-specific DoS attacks.
### Aggressive NSEC cachingI agree with this paragraph and disagree with disagreements elsewhere on
the mailing list :-)
### Local Root
**Local root should be used.**
For: Public resolver operators.
Since the root zone is DNSSEC signed,
^^^ Something is missing here, I guess?
Running a local root has several benefits, but it is an additional
component to maintain. For public resolver operators this is
definitely worth the cost, but other resolver operators may choose to
simply send all queries to the well-distributed root name servers.
[comment, no text change proposed]
With proper monitoring in place, sure, but it's not really buying much
if aggressive caching is enabled. With RFC 8198 you get benefits of
local root within seconds and with less operational complexity and
fragility.
### TTL Recommendations
**TTL limits may be adjusted.**
For: All DNS resolver operators.
Software typically defaults to a maximum stored TTL of 1 or 2 days.
This may be lowered to reduce the cache size. A lower TTL will mean
removing rarely-used records that have long TTL, and should not have
much operational impact from a CPU or network point of view, but may
save memory.
[Substantial]
This section seems *entirely* incorrect to me. Cache needs some sort of
limit on its size anyway, regardless of TTL limits. Artificially
limiting TTLs is entirely ineffective as a method to limit cache size in
many scenarios - e.g. when under random subdomain attack. A proper cache
cleaning algorithm should take care of evicting least used records, and
no TTL limits are needed.
I think this section should discuss impact of very long TTLs on
availability when someone messes up things on the auth side (slack.com
DS, anyone?), or when the auth side is under attack.
https://ant.isi.edu/~johnh/PAPERS/Moura19b.html is an excellent resource
possibly worth linking to.
It is possible to set a minimum TTL in many implementations. This is a
violation of the DNS protocol, although may be useful to reduce load
from records with very low TTL (less than 5 seconds).
[nit]
I argue that setting lower bound higher than ~ seconds is antisocial and
asking for operational trouble. On the other hand TTL=0 is antisocial
from the auth side and should be outlawed :-) Personally I would be fine
with recommending minimum TTL=1 second.
### EDNS Client Subnet (ECS)
**ECS may be enabled.**
[Substantial]
Can we say something like
**ECS may be enabled if careful evaluation indicates it is beneficial.**
?
For: All DNS resolver operators.
EDNS Client Subnet (ECS) allows the resolver to include information
about the IP address of the client querying it when sending messages
to authoritative servers. This may allow authoritative servers to
provide different answers which are more appropriate for the client.
However, ECS will increase the amount of cache space required by
resolvers, may reduce DNS performance, and may have privacy
implications.
It most certainly _will_ (not may) reduce DNS performance. But it
reportedly increases non-DNS performance in certain scenarios :shrug:
[Substantial]
At this point in text I would like to prepend a sentence like this:
"A resolver operator whose clients share single network path to the
Internet will see no benefit at all."
A resolver operator that has clients that are limited to a specific
region may see no benefit. A resolver operator that has a widely
distributed anycast network may not have much benefit from ECS, since
the locations that initiate the query will be close to the client. But
a resolver operator that answers client queries only from a few
locations, and expects clients to come from a wide area, may provide
better service for end-users by supporting ECS.
EDNS client subnet is described in
[RFC7871](https://www.rfc-editor.org/rfc/rfc7871.html), an
informational RFC.
### Trust Anchor Reporting
**Trust anchor reporting may be enabled.**
[nit]
I would say it should be enabled. It costs almost nothing and has
negligible risks for anyone.
-----------
Now the hard part - missing pieces.
This one hard to put under existing headings. Should it be under ###
Software considerations or ### Networking considerations or elsewhere?
Anyway: RFC 8906
A Common Operational Problem in DNS Servers: Failure to Communicate
https://datatracker.ietf.org/doc/html/rfc8906
If an operator puts a "security appliance" in front of the DNS server to
increase its purported "security" it messes up the protocol and breaks
things. Most importantly, failure to respond to _ALL_ queries (because
the "appliance thinks some queries are not safe or needed") leads to
exploitable protocol-level issues. See e.g. paper about trouble in
resolver-auth transactions:
Silence is not Golden: Disrupting the Load Balancing of Authoritative
DNS Servers
https://indico.dns-oarc.net/event/47/contributions/1018/
Similarly, non-response to stub clients _also_ creates problems because
stubs are notoriously bad at handling retransmissions in timely manner etc.
I think this is worth calling out. "Even if your server is not going to
answer a query, send back at least RCODE REFUSED." or something like that.
Congratulations if you made it this far - and thank you for your time!
--
Petr Špaček
Internet Systems Consortium
--
To unsubscribe from this mailing list, get a password reminder, or change your
subscription options, please visit:
https://lists.ripe.net/mailman/listinfo/dns-wg