On Thu 2021-11-11 16:16:24 +0000, Jim Reid wrote:
>> On 11 Nov 2021, at 15:28, Christian Huitema <[email protected]> wrote:
>> 
>> It is not uncommon to see upgrades being rolled out at different times to 
>> different servers in the farm. Opportunistic strategies and probing 
>> strategies have to deal with that.
>
> This can be complex. Lots of busy domain names (like TLDs) use a combination 
> of DNS servers that are managed and operated by different organisations using 
> different flavours of softwware for the obvious SPoF reasons. Which means 
> upgrades can be like changing a plane's engines in mid-flight. For instance, 
> look at how long it took for all 12 RSOs to be in a position to support 
> DNSSEC and IPv6.

Thanks for this discussion, y'all.

I've tried to capture these thoughts with some additional text in the
draft, which you can see here:

  
https://gitlab.com/dkg/dprive-unilateral-probing/-/commit/477721af91dc517a0696c27a7ae3b6a97f8795a3

In particular, there's a section about how authoritatives can more
safely deploy in a heterogenous pooled or anycasted situation:

------------
## Pooled Authoritative Servers Behind a Single IP Address 
{#authoritative-pools}

Some authoritative DNS servers are structured as a pool of authoritatives 
standing behind a load-balancer that runs on a single IP address, forwarding 
queries to members of the pool.

In such a deployment, individual members of the pool typically get updated 
independently from each other.

A recursive resolver following the guidance in {{recursive-guidance}} that 
interacts with such a pool likely does not know that it is a pool.
If some members of the pool are updated to follow this guidance while others 
are not, the recursive client might see the pool as a single authoritative 
server that sometimes offers and sometimes refuses encrypted transport.

To avoid incurring additional minor timeouts for such a recursive resolver, the 
pool operator should either:

- ensure that all members of the pool enable the same encrypted transport(s) 
simultaenously, or
- ensure that the load balancer maps client requests to pool members based on 
client IP addresses.

Similar concerns apply to authoritative servers responding from an anycast IP 
address.
As long as the pool of servers is in a heterogenous state, any flapping route 
that switches a given client IP address to a different responder risks 
incurring an additional timeout.
Frequent changes of routing for anycast listening IP addresses are also likely 
to cause problems for TLS, TCP, or QUIC connection state as well, so stable 
routes are important to ensure that the service remains available and 
responsive.

------------

and a bit of a reminder for operators of recursive resolvers:

------------
### Separate State for Each of the Recursive Resolver's Own IP Addresses 
{#resolver-binding}

Note that the recursive resolver should record this per-authorititative-IP 
state for each IP address it uses as it sends its queries.
For example, if a recursive resolver can send a packet to authoritative servers 
from IP addresses 192.0.2.100 and 192.0.2.200, it should keep two distinct sets 
of per-authoritative-IP state, one for each source address it uses.
Keeping these state tables distinct for each source address makes it possible 
for a pooled authoritative server behind a load balancer to do a partial 
rollout while minimizing accidental timeouts (see {{authoritative-pools}}).
------------

We'll include something along these lines in draft -01.  If you'd like
to propose fixes or raise concerns, Joey and I would be happy to
incorporate them.

            --dkg

Attachment: signature.asc
Description: PGP signature

_______________________________________________
dns-privacy mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dns-privacy

Reply via email to