On Tue, Mar 13, 2018 at 6:27 PM Matthew Hardeman <mharde...@gmail.com> wrote:
> Another question this incident raised in my mind pertains to the parallel >>> staging and production environment paradigm: If one truly has the >>> 'courage >>> of conviction' of the equivalence of the two environments, why would one >>> not perform all tests in ONLY the staging environment, with no tests and >>> nothing other than production transactions on the production environment? >>> That tests continue to be executed in the production environment while >>> holding to the notion that a fully parallel staging environment is the >>> place for tests seems to signal that confidence in the staging >>> environment >>> is -- in some measure, however small -- limited. >> >> >> That's ... just a bad conclusion, especially for a publicly-trusted CA :) >> >> > I certainly agree it's possible that I've reached a bad conclusion there, > but I would like to better understand how specifically? Assuming the same > input data set and software manipulating said data, two systems should in > general execute identically. To the extent that they do not, my initial > position would be that a significant failing of change management of > operating environment or data set or system level matters has occurred. I > would think all of those would be issues of great concern to a CA, if for > no other reason than that they should be very very rare. > I get the impression you may not have run complex production systems, especially distributed systems, or spent much time with testing methodology, given statements such as “courage or your conviction.” No testing system is going to be perfect, and there’s a difference between designed redundancy and unnecessary testing. For example, even if you had 100% code coverage through tests, there are still things that are possible to get wrong - for example, you could test every line of your codebase and still fail to properly handle IDNs, for example - or, as other CAs have shown, ampersands. It’s foolish to think that a staging environment will cover every possible permutation - even if you solved the halting problem, you will still have issues with, say, solar radiation induced bitflips, or RAM heat, or any number of other issues. And yes, these are issues still affecting real systems today, not scary stories we tell our SREs to keep them up at night. Look at any complex system - avionics, military command-and-control, certificate authorities, modern scalable websites - and you will find systems designed with redundancy throughout, to ensure proper functioning. It is the madness of inexperience to suggest that somehow this redundancy is unnecessary or somehow a black mark - the Sean Hannity approach of “F’ it, we’ll do it live” is the antithesis of modern and secure design. The suggestion that this is somehow a sign of insufficient testing or design is, at best, naive, and at worse, detrimental towards discussions of how to improve the ecosystem. > _______________________________________________ dev-security-policy mailing list dev-security-policy@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security-policy