On Thu 2021-11-11 16:16:24 +0000, Jim Reid wrote: >> On 11 Nov 2021, at 15:28, Christian Huitema <[email protected]> wrote: >> >> It is not uncommon to see upgrades being rolled out at different times to >> different servers in the farm. Opportunistic strategies and probing >> strategies have to deal with that. > > This can be complex. Lots of busy domain names (like TLDs) use a combination > of DNS servers that are managed and operated by different organisations using > different flavours of softwware for the obvious SPoF reasons. Which means > upgrades can be like changing a plane's engines in mid-flight. For instance, > look at how long it took for all 12 RSOs to be in a position to support > DNSSEC and IPv6.
Thanks for this discussion, y'all. I've tried to capture these thoughts with some additional text in the draft, which you can see here: https://gitlab.com/dkg/dprive-unilateral-probing/-/commit/477721af91dc517a0696c27a7ae3b6a97f8795a3 In particular, there's a section about how authoritatives can more safely deploy in a heterogenous pooled or anycasted situation: ------------ ## Pooled Authoritative Servers Behind a Single IP Address {#authoritative-pools} Some authoritative DNS servers are structured as a pool of authoritatives standing behind a load-balancer that runs on a single IP address, forwarding queries to members of the pool. In such a deployment, individual members of the pool typically get updated independently from each other. A recursive resolver following the guidance in {{recursive-guidance}} that interacts with such a pool likely does not know that it is a pool. If some members of the pool are updated to follow this guidance while others are not, the recursive client might see the pool as a single authoritative server that sometimes offers and sometimes refuses encrypted transport. To avoid incurring additional minor timeouts for such a recursive resolver, the pool operator should either: - ensure that all members of the pool enable the same encrypted transport(s) simultaenously, or - ensure that the load balancer maps client requests to pool members based on client IP addresses. Similar concerns apply to authoritative servers responding from an anycast IP address. As long as the pool of servers is in a heterogenous state, any flapping route that switches a given client IP address to a different responder risks incurring an additional timeout. Frequent changes of routing for anycast listening IP addresses are also likely to cause problems for TLS, TCP, or QUIC connection state as well, so stable routes are important to ensure that the service remains available and responsive. ------------ and a bit of a reminder for operators of recursive resolvers: ------------ ### Separate State for Each of the Recursive Resolver's Own IP Addresses {#resolver-binding} Note that the recursive resolver should record this per-authorititative-IP state for each IP address it uses as it sends its queries. For example, if a recursive resolver can send a packet to authoritative servers from IP addresses 192.0.2.100 and 192.0.2.200, it should keep two distinct sets of per-authoritative-IP state, one for each source address it uses. Keeping these state tables distinct for each source address makes it possible for a pooled authoritative server behind a load balancer to do a partial rollout while minimizing accidental timeouts (see {{authoritative-pools}}). ------------ We'll include something along these lines in draft -01. If you'd like to propose fixes or raise concerns, Joey and I would be happy to incorporate them. --dkg
signature.asc
Description: PGP signature
_______________________________________________ dns-privacy mailing list [email protected] https://www.ietf.org/mailman/listinfo/dns-privacy
