On Tuesday, March 13, 2018 at 2:02:45 PM UTC-7, Ryan Sleevi wrote:
> I'm hoping that LE can provide more details about the change management
> process and how, in light of this incident, it may change - both in terms
> of automated testing and in certificate policy review.

Forgot to reply to this specific part. Our change management process starts 
with our SDLC, which mandates code review (typically dual code review), unit 
tests, and where appropriate, integration tests. All unittests and integrations 
tests are run automatically with every change, and before every deploy. Our 
operations team checks the automated test status and will not deploy if the 
tests are broken. Any configuration changes that we plan to apply in staging 
and production are first added to our automated tests.

Each deploy then spends a period of time in our staging environment, where it 
is subject to further automated tests: periodic issuance testing, plus 
performance, availability, and correctness monitoring equivalent to our 
production environment. This includes running the cert-checker software I 
mentioned earlier. Typically our deploys spend two days in our staging 
environment before going live, though that depends on our risk evaluation, and 
hotfix deploys may spend less time in staging if we have high confidence in 
their safety. Similarly, any configuration changes are applied to the staging 
environment before going to production. For significant changes we do 
additional manual testing in the staging environment. Generally this testing 
means checking that the new change was applied as expected, and that no errors 
were produced. We don't rely on manual testing as a primary way of catching 
bugs; we automate everything we can.

If the staging deployment or configuration change doesn't show any problems, we 
continue to production. Production has the same suite of automated live tests 
as staging. And similar to staging, for significant changes we do additional 
manual testing. It was this step that caught the encoding issue, when one of 
our staff used crt.sh's lint tool to double check the test certificate they 
issued.

Clearly we should have caught this earlier in the process. The changes we have 
in the pipeline (integrating certlint and/or zlint) would have automatically 
caught the encoding issue at each staging in the pipeline: in development, in 
staging, and in production.
_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

Reply via email to