On Wed, Mar 13, 2013 at 9:24 PM, Jay Ashworth <j...@baylink.com> wrote:

> ----- Original Message -----
> > From: "Ryan Lane" <rlan...@gmail.com>
>
> > > Hey, Ryan; did you see, perhaps on outages-discussion, the after action
> > > report from Microsoft about how their Azure SSL cert expiration screwup
> > > happened?
>
> > What's the relevance here?
>
> "Does ops have a procedure for avoiding unexpected SSL cert expirations,
> and does this affect it in any way other than making it easier to
> implement?",
> I would think...
>
>
We didn't have a certificate expiration. We replaced all individual
certificates, delivered by different top level domains, with a single
unified certificate. This change was to fix certificate errors being shown
on all non-wikipedia domains for HTTPS mobile users, who were being
delivered the *.wikipedia.org certificate for all domains.

The unified certificate was missing 6 Subject Alternative Names:
mediawiki.org, *.mediawiki.org,  m.mediawiki.org, *.m.mediawiki.org,
m.wikipedia.org and *.m.wikipedia.org. Shortly after deploying the
certificate we noticed it was bad and reverted the affected services (
mediawiki.org and mobile) back to their individual certificates. The change
only affected a small portion of users for a short period of time.

If you notice, I've already mentioned how we'll avoid and more quickly
detect problems like this in the future:

"Needless to say I'll be writing a script that can be run against a cert to
ensure it's not missing anything. We'll also be adding monitoring to check
for invalid certificates for any top level domain."

- Ryan
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to