On Wed, Mar 13, 2013 at 9:24 PM, Jay Ashworth <j...@baylink.com> wrote:
> ----- Original Message ----- > > From: "Ryan Lane" <rlan...@gmail.com> > > > > Hey, Ryan; did you see, perhaps on outages-discussion, the after action > > > report from Microsoft about how their Azure SSL cert expiration screwup > > > happened? > > > What's the relevance here? > > "Does ops have a procedure for avoiding unexpected SSL cert expirations, > and does this affect it in any way other than making it easier to > implement?", > I would think... > > We didn't have a certificate expiration. We replaced all individual certificates, delivered by different top level domains, with a single unified certificate. This change was to fix certificate errors being shown on all non-wikipedia domains for HTTPS mobile users, who were being delivered the *.wikipedia.org certificate for all domains. The unified certificate was missing 6 Subject Alternative Names: mediawiki.org, *.mediawiki.org, m.mediawiki.org, *.m.mediawiki.org, m.wikipedia.org and *.m.wikipedia.org. Shortly after deploying the certificate we noticed it was bad and reverted the affected services ( mediawiki.org and mobile) back to their individual certificates. The change only affected a small portion of users for a short period of time. If you notice, I've already mentioned how we'll avoid and more quickly detect problems like this in the future: "Needless to say I'll be writing a script that can be run against a cert to ensure it's not missing anything. We'll also be adding monitoring to check for invalid certificates for any top level domain." - Ryan _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l