On Tue, Mar 13, 2018 at 6:27 PM Matthew Hardeman <mharde...@gmail.com>
wrote:

> Another question this incident raised in my mind pertains to the parallel
>>> staging and production environment paradigm:  If one truly has the
>>> 'courage
>>> of conviction' of the equivalence of the two environments, why would one
>>> not perform all tests in ONLY the staging environment, with no tests and
>>> nothing other than production transactions on the production environment?
>>> That tests continue to be executed in the production environment while
>>> holding to the notion that a fully parallel staging environment is the
>>> place for tests seems to signal that confidence in the staging
>>> environment
>>> is -- in some measure, however small -- limited.
>>
>>
>> That's ... just a bad conclusion, especially for a publicly-trusted CA :)
>>
>>
> I certainly agree it's possible that I've reached a bad conclusion there,
> but I would like to better understand how specifically?  Assuming the same
> input data set and software manipulating said data, two systems should in
> general execute identically.  To the extent that they do not, my initial
> position would be that a significant failing of change management of
> operating environment or data set or system level matters has occurred.  I
> would think all of those would be issues of great concern to a CA, if for
> no other reason than that they should be very very rare.
>

I get the impression you may not have run complex production systems,
especially distributed systems, or spent much time with testing
methodology, given statements such as “courage or your conviction.”

No testing system is going to be perfect, and there’s a difference between
designed redundancy and unnecessary testing.

For example, even if you had 100% code coverage through tests, there are
still things that are possible to get wrong - for example, you could test
every line of your codebase and still fail to properly handle IDNs, for
example - or, as other CAs have shown, ampersands.

It’s foolish to think that a staging environment will cover every possible
permutation - even if you solved the halting problem, you will still have
issues with, say, solar radiation induced bitflips, or RAM heat, or any
number of other issues. And yes, these are issues still affecting real
systems today, not scary stories we tell our SREs to keep them up at night.

Look at any complex system - avionics, military command-and-control,
certificate authorities, modern scalable websites - and you will find
systems designed with redundancy throughout, to ensure proper functioning.
It is the madness of inexperience to suggest that somehow this redundancy
is unnecessary or somehow a black mark - the Sean Hannity approach of “F’
it, we’ll do it live” is the antithesis of modern and secure design. The
suggestion that this is somehow a sign of insufficient testing or design
is, at best, naive, and at worse, detrimental towards discussions of how to
improve the ecosystem.

>
_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

Reply via email to