On Tuesday, March 13, 2018 at 2:02:45 PM UTC-7, Ryan Sleevi wrote: > I'm hoping that LE can provide more details about the change management > process and how, in light of this incident, it may change - both in terms > of automated testing and in certificate policy review.
Forgot to reply to this specific part. Our change management process starts with our SDLC, which mandates code review (typically dual code review), unit tests, and where appropriate, integration tests. All unittests and integrations tests are run automatically with every change, and before every deploy. Our operations team checks the automated test status and will not deploy if the tests are broken. Any configuration changes that we plan to apply in staging and production are first added to our automated tests. Each deploy then spends a period of time in our staging environment, where it is subject to further automated tests: periodic issuance testing, plus performance, availability, and correctness monitoring equivalent to our production environment. This includes running the cert-checker software I mentioned earlier. Typically our deploys spend two days in our staging environment before going live, though that depends on our risk evaluation, and hotfix deploys may spend less time in staging if we have high confidence in their safety. Similarly, any configuration changes are applied to the staging environment before going to production. For significant changes we do additional manual testing in the staging environment. Generally this testing means checking that the new change was applied as expected, and that no errors were produced. We don't rely on manual testing as a primary way of catching bugs; we automate everything we can. If the staging deployment or configuration change doesn't show any problems, we continue to production. Production has the same suite of automated live tests as staging. And similar to staging, for significant changes we do additional manual testing. It was this step that caught the encoding issue, when one of our staff used crt.sh's lint tool to double check the test certificate they issued. Clearly we should have caught this earlier in the process. The changes we have in the pipeline (integrating certlint and/or zlint) would have automatically caught the encoding issue at each staging in the pipeline: in development, in staging, and in production. _______________________________________________ dev-security-policy mailing list email@example.com https://lists.mozilla.org/listinfo/dev-security-policy