From: Manu Bretelle <[email protected]> Date: Tuesday, February 13, 2024 at 19:03 To: Edward Lewis <[email protected]> Cc: "[email protected]" <[email protected]> Subject: Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting expectations in protocol definitions
First - why am I resisting this proposal? I believe that for the sake of operations, development of protocols must trend towards simplicity. I would add a flag or field when necessary and only then, lest it be forgotten (a burden with no benefit upon code maintainers) or worse a stumbling block (misused, mis-set, generally mis-understood). On Tue, Feb 13, 2024 at 7:35 AM Edward Lewis <[email protected]<mailto:[email protected]>> wrote: >An operator dipping its toes with DELEG and encrypted protocols may be willing >to signal to a resolver that such failures are likely operational failure >because this is a testing endpoint that may be unstable due to lack of >operational expertise. A privacy aware resolver can then decide to fallback on >clear-text. Again, there is nothing preventing the resolver to fail hard here, >this is out of the control of the auth server operator. All that can be done >is to "signal". Wouldn’t the availability of the fallback transport be enough signal that the service operator does not have full faith in the preferred transport? Having a separate flag is like a second source of data, there might be an inconsistency between the two, which is a generic form of root cause. >I could also imagine an operator going through their first cert rotation to be >erring on the side of safety and switching to "testing" mode temporarily. A bit of my concern is that sometimes we forget to remove the training wheels once we’ve learned. A common error in operations is to forget the cleanup phase (remove old files, etc.) once new functionality has been proven. This is a reason why I’m hesitant to support having a flag like this. >If you look back at DNSSEC, had it been possible to turn DNSSEC in >"permissive" mode, would more operators have taken the leap to enable it >knowing that resolvers that would validate records would have been willing to >fallback while the flag is on? I think from an operational point of view, this >is something that can be of great help to build operational confidence and >expertise without taking the risk to break one's DNS. Yes, yes it would. Early on there was criticism that DNSSEC was “ok” or “fail”. When operators messed up their key rotations (this happened quite often around 2010), there were calls to “purge caches” and even some thought given to automating a way for operators to initiate a global cache purge of their data. (Failed, of course - there’s no way.) This was followed by the development of negative trust anchors after the COMCAST/NASA.gov issue, something that was an uphill battle by operators to get documented in an IETF document. More recently, an operator asked me about a developing a new resource record type that could be published at a zone apex to signal that all validations records signed by the apex keyset ought to be ignored. (Sketched up, but not what the operator had in mind.) Operators list the great leap of risk as a reason not to implement DNSSEC. The protocol design did not accommodate a soft introduction. The levels of certainty are binary - thumbs up or thumbs down thanks to the reliance on the DNS response code as the only error channel. When I wrote a prototype validator during experimentation on DNSSEC, I realized that there were 50 or so if statements, anyone of which would cause validation to fail. Some of the if’s were likely transient, some persistent, and so on, this information would have informed the response. But we didn’t have enough bandwidth (that response code field was all) to feed that back up the chain. We probably then ought to have defined an extended response code mechanism - which is now a current work in progress in DNSOP, if I’m right. In summary - I think this flag would be redundant to the availability of a means to fallback. Basing the justification on “testing phase” assumes that it is a distinct phase with a declared ending - which I don’t believe is often true. And I think we do need to build in a way for risk of adoption (initial or otherwise) to be lower, one way is via better feedback, other ways via abilities to “test-in-prod” (“immediate trial period, when staff is able to watch it launch before leaving for lunch”) and so on.
_______________________________________________ DNSOP mailing list [email protected] https://www.ietf.org/mailman/listinfo/dnsop
