[DNSOP] Re: New draft: draft-powers-dnsop-expire-00 —EXPIRE opcode

Duane Powers Mon, 24 Nov 2025 11:14:05 -0800

Hi Andrew,

Thank you for laying out the concerns so clearly — I think you’ve captured the 
main questions people have about EXPIRE. Out of brevity, I'll roll up the couple
 emails I’ve seen into this reply.


To clarify the trust model: for internal resolvers, TSIG would likely be the
primary mechanism (IP ACL might work in trusted environments). These
are the normal control surfaces, so I’d expect operators to add a TSIG key
to the resolver configuration the same way they do for other authenticated
operations.

For resolvers that are not under your administrative control, DNSSEC is the
other path. If a resolver validates DNSSEC and implements EXPIRE, it can 
authenticate the operation cryptographically with no prior relationship or 
coordination. This covers the common scenario where you observe stale
data at a resolver you don’t operate.

So, the model is operator --> resolver, authenticated by TSIG within your
own realm, and by DNSSEC everywhere else. The authoritative nameserver
is not involved here at all, with the exception being to sign the EXPIRE RRset.

On the “two-class resolver” concern: I actually see EXPIRE reducing that,
not creating it. Today the large operators already have their own purge APIs
and resolver-specific control channels. Everyone else just waits out the
TTL.  That division exists right now.

A standardized mechanism gives smaller operators a tool they’ve never
had. And operationally, it flips the math a bit: if EXPIRE becomes a normal
part of coordinated recovery, the resolvers that don’t support it will be the
ones holding stale data long after everyone else has moved on. That isn’t
privileging anyone, it’s just aligning with the operational incentives that  
already exist around outage duration and recovery time.

In another mail, Paul Vixie mentioned EDNS0 signaling, and I think that would 
be a natural extension (I'm thinking of how nsid works).

Happy to continue refining the model in -01 based on this discussion.

Best,
Duane


> On Nov 21, 2025, at 13:47, Andrew Sullivan <[email protected]> wrote:
> 
> Dear colleagues,
> 
> On Fri, Nov 21, 2025 at 09:29:57AM -0500, Duane Powers wrote:
> 
>> vendor-specific purge mechanisms: the operator already knows the
>> resolvers they administer and has an established trust relationship with
>> them.
> 
> I think this is the crux of the issue with this draft.
> 
> Short of logging queries and keeping track of what resolvers asked for which 
> QNAMEs (something I very much suggest authoritative servers not do), an 
> authoritative server cannot possibly know what caches need to be invalidated. 
>  So, there's no way to build the list to EXPIRE except "guess blindly," which 
> effectively devolves to "contact the big resolvers we know."
> 
> If you already have a trust relationship with the resolver, then I think 
> we're talking about an operational model of the DNS that has not historically 
> been one the IETF has included in its modelling: 2-class resolver 
> arrangements.  In this model, some resolvers (either operated by the same 
> operator as the authoritative operator or by some other operator, it doesn't 
> matter which) have some kind of control channel arrangement between the 
> authoritative servers and the resolvers.  (Note this control channel could be 
> sneakernet.  I don't care the details: the point is that there is some 
> distinguishing characteristic of the resolvers that puts them in this class.) 
>  The rest of the resolvers have no control channel and must rely exclusively 
> on ordinary DNS query-response mechanisms.  (An easy way to imagine this 
> distinction is to ask whether a given DNS querier might get an answer to an 
> AXFR or IXFR query.  If so, and if that querier is a resolver, then it is in 
> the first class; otherwise, in the second.)
> 
> This two-class structure is, to be sure, an arrangement that someone might 
> make.  But it does not seem to me to be an ordinary case, and if that is the 
> only use case then it seems complicating the DNS protocol for this purpose 
> might not be the most natural way to handle it.  If that is _not_ the only 
> use case, then I don't see how the authoritative server knows which resolvers 
> to send EXPIRE to except for the obvious expediency of "contact the big ones 
> we know," which is subject to the complaints about facilitating concentration 
> that others have raised.  (I'm actually somewhat agnostic about this, since 
> it seems to me the Internet is dying in favour of a cable TV model anyway.  
> If we're doing engineering, maybe we need a standard behaviour to support the 
> 10 important players and everyone else gets B-grade service.)
> 
> It strikes me, however, that if one wanted this to be generally useful to the 
> Internet of many and varied network infrastuctures, then a somewhat more 
> ambitious design would be needed.  Suppose you are an authoritative server 
> and you have a high-priority cache flush.  You know you had a long-lived 
> entry for an important RRset that you've had to change [let's call this 
> RR(I)], and that is going to cause various kinds of outages.  What you need 
> is a way to signal _any_ resolver R that RR(I), despite the TTL they had when 
> they got that RR(I), is now invalid.  Then, anyone who knows about the 
> problem could send a query towards R that would cause R somehow to fetch the 
> information that the cache entry for RR(I) is bad, and (supposing R supported 
> this feature) that could trigger R to expire the cache entry for RR(I).  The 
> next time R needs RR(I), it has no cache, so it makes the query and gets the 
> new, valid entry.
> 
> It would seem that one obvious way to do this would be to create a leaf zone 
> with an underscore label that would contain a new RRTYPE with information 
> about what entries need invalidation.  I was about to outline how to do it, 
> except that an important use case in the draft is the need to substitute NS 
> records when the name servers for a zone are under attack.  Obviously, then, 
> a leaf node won't work.  This is, if I may say so, yet another example where 
> a mechanism in the DNS to signal administrative arrangements that do not fall 
> strictly along hierarchical lines would have been useful, but DBOUND was a 
> failure.
> 
> Best regards,
> 
> A
> 
> -- 
> Andrew Sullivan
> [email protected]
> 
> _______________________________________________
> DNSOP mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

_______________________________________________
DNSOP mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[DNSOP] Re: New draft: draft-powers-dnsop-expire-00 —EXPIRE opcode

Reply via email to