From: Manu Bretelle <[email protected]>
Date: Tuesday, February 13, 2024 at 19:03
To: Edward Lewis <[email protected]>
Cc: "[email protected]" <[email protected]>
Subject: Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting 
expectations in protocol definitions

First - why am I resisting this proposal?  I believe that for the sake of 
operations, development of protocols must trend towards simplicity.  I would 
add a flag or field when necessary and only then, lest it be forgotten (a 
burden with no benefit upon code maintainers) or worse a stumbling block 
(misused, mis-set, generally mis-understood).

On Tue, Feb 13, 2024 at 7:35 AM Edward Lewis 
<[email protected]<mailto:[email protected]>> wrote:


>An operator dipping its toes with DELEG and encrypted protocols may be willing 
>to signal to a resolver that such failures are likely operational failure 
>because this is a testing endpoint that may be unstable due to lack of 
>operational expertise. A privacy aware resolver can then decide to fallback on 
>clear-text. Again, there is nothing preventing the resolver to fail hard here, 
>this is out of the control of the auth server operator. All that can be done 
>is to "signal".

Wouldn’t the availability of the fallback transport be enough signal that the 
service operator does not have full faith in the preferred transport?  Having a 
separate flag is like a second source of data, there might be an inconsistency 
between the two, which is a generic form of root cause.

>I could also imagine an operator going through their first cert rotation to be 
>erring on the side of safety and switching to "testing" mode temporarily.
A bit of my concern is that sometimes we forget to remove the training wheels 
once we’ve learned.  A common error in operations is to forget the cleanup 
phase (remove old files, etc.) once new functionality has been proven.  This is 
a reason why I’m hesitant to support having a flag like this.
>If you look back at DNSSEC, had it been possible to turn DNSSEC in 
>"permissive" mode, would more operators have taken the leap to enable it 
>knowing that resolvers that would validate records would have been willing to 
>fallback while the flag is on? I think from an operational point of view, this 
>is something that can be of great help to build operational confidence and 
>expertise without taking the risk to break one's DNS.

Yes, yes it would.  Early on there was criticism that DNSSEC was “ok” or 
“fail”.  When operators messed up their key rotations (this happened quite 
often around 2010), there were calls to “purge caches” and even some thought 
given to automating a way for operators to initiate a global cache purge of 
their data.  (Failed, of course - there’s no way.)  This was followed by the 
development of negative trust anchors after the COMCAST/NASA.gov issue, 
something that was an uphill battle by operators to get documented in an IETF 
document.  More recently, an operator asked me about a developing a new 
resource record type that could be published at a zone apex to signal that all 
validations records signed by the apex keyset ought to be ignored.  (Sketched 
up, but not what the operator had in mind.)

Operators list the great leap of risk as a reason not to implement DNSSEC.  The 
protocol design did not accommodate a soft introduction.  The levels of 
certainty are binary - thumbs up or thumbs down thanks to the reliance on the 
DNS response code as the only error channel.

When I wrote a prototype validator during experimentation on DNSSEC, I realized 
that there were 50 or so if statements, anyone of which would cause validation 
to fail.  Some of the if’s were likely transient, some persistent, and so on, 
this information would have informed the response.  But we didn’t have enough 
bandwidth (that response code field was all) to feed that back up the chain.  
We probably then ought to have defined an extended response code mechanism - 
which is now a current work in progress in DNSOP, if I’m right.

In summary - I think this flag would be redundant to the availability of a 
means to fallback.  Basing the justification on “testing phase” assumes that it 
is a distinct phase with a declared ending - which I don’t believe is often 
true.  And I think we do need to build in a way for risk of adoption (initial 
or otherwise) to be lower, one way is via better feedback, other ways via 
abilities to “test-in-prod” (“immediate trial period, when staff is able to 
watch it launch before leaving for lunch”) and so on.
_______________________________________________
DNSOP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to