Automation isn't an solution in an of itself.  When I recently mentioned, 
during a panel discussion, that automation is essential (for scalability), an 
operator on the same panel responded that automation is also a great way to 
scale problems.  Automation is needed but it must be automated correctly and 
rely on good heuristics when it can't be deterministic.  Automation contributes 
to resiliency insofar as addressing "fat-fingering" and "forgot to do it", but 
it won't address systemic issues.  Automation won't fix weaknesses, but 
appropriately done it enables scalability and contributes to stability.  (Much 
the same as "rebooting" never fixes problems, but it does make then go away - 
for a while.)

Nevertheless, the protocol definition has to expect and react appropriately to 
benign operations errors.  What this means is that the protocol definition 
needs to include features that a receiver can use when expectations are not met 
to determine how to react.  In the first DNSSEC validator (circa 1998), there 
were 50-100 different error codes, some indicated transient problems, some 
persistent, some suspicious, some superficial, with the problem becoming that 
only SERVFAIL was available to signal an error, a well-known knock on the 
DNSSEC design.  In a perfect world, the protocol definition would not give rise 
to mistakes, a design ought to be graded on how far it goes towards that goal, 
but there'll never been a perfect world.

As far as deployment, I think measurements of that ought to be integral in 
judging how well a protocol is designed.  I attended part of the "Evolvability, 
Deployability, & Maintainability" (edm) WG session at IETF 118 and joined the 
mailing list to make that point but have heard no reaction.  The discussion was 
focused only on seeing multiple implementations, falling short of examining 
whether anyone made us of the code (paths).  Deployment to me is how the field 
of operations grades a protocol definition.

On 2/1/24, 07:49, "DNSOP on behalf of Peter Thomassen" <dnsop-boun...@ietf.org 
on behalf of pe...@desec.io> wrote:



    On 2/1/24 13:34, Edward Lewis wrote:
    > The proper response will depend on the reason - more accurately the 
presumed (lacking any out-of-band signals) reason - why the record is absent.

    Barring any other information, the proper response should IMHO not depend 
on the presumed reason, but assume the worst case. Anything else would break 
expected security guarantees.

    > From observations of the deployment of DNSSEC, [...]
    > It’s very important that a secured protocol be able to thwart or limit 
damage due to malicious behavior, but it also needs to tolerate benign 
operational mistakes.  If mistakes are frequent and addressed by dropping the 
guard, then the security system is a wasted in investment.

    That latter sentence seems right to me, but it doesn't follow that the 
protocol needs to tolerate "benign operational mistakes".

    Another approach would be to accompany protocol deployment with a suitable 
set of automation tools, so that the chance of operational mistakes goes down. 
That would be my main take-away from DNSSEC observations.

    In other words, perhaps we should consider a protocol incomplete if the 
spec doesn't easily accommodate automation and deployment without it would 
yield significant operational risk.

    Let's try to include automation aspects from the beginning.

    Peter

    -- 
    
https://urldefense.com/v3/__https://desec.io/__;!!PtGJab4!59Bd5xr0sMeJ5zWRh-uPWUQ_wVp05KY0rjweR55k1uxSyApBVPnOv28bYt2OwrkEgN-EyLTU3zHpyHG-bb4tB5c$
 [desec[.]io]

    _______________________________________________
    DNSOP mailing list
    DNSOP@ietf.org
    
https://urldefense.com/v3/__https://www.ietf.org/mailman/listinfo/dnsop__;!!PtGJab4!59Bd5xr0sMeJ5zWRh-uPWUQ_wVp05KY0rjweR55k1uxSyApBVPnOv28bYt2OwrkEgN-EyLTU3zHpyHG-XdGs2c4$
 [ietf[.]org]

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to