Musings on mass key-compromise revocations

Matt Palmer via dev-security-policy Sat, 28 Mar 2020 01:12:09 -0700

I've been asked to provide some "big-picture" thoughts on how the process
for key compromise revocations works, doesn't work, and could be improved. 
This is based on the work that I've done over the past month or so,
requesting revocation of certificates which have had their private keys
disclosed by being posted to some public location.


This e-mail is intended to provide a summary of what I did and how it all
went, with a summary of the things that I came across which I feel could
stand to be improved.  Most of these improvements will likely come through
changes to the Mozilla or CCADB policies, or via changes to the BRs.  A
couple are things that CAs themselves need to take on board, as they aren't
"policy" matters as such, but are still issues of concern.

As the exact nature of how a problem may be solved will, no doubt, engender
no small amount of debate, I intend (unless someone tells me I'm once again
being a goose), to provide separate e-mail thread-starters on the details
of the issues I found, along with my proposals for how the issues might be
solved.  I will be providing a summary of the issues I found, but all the
gory minutiae will be provided in later e-mail threads.

One final thing before I get started: I'd like to give a big shout-out to
Rob Stradling and anyone else who is involved in making crt.sh happen, and
Sectigo for sponsoring its existence.  Everything I've done here would have
been a zillion times harder if there wasn't already a database of every
CT-logged certificate available for me to hammer on.  It is an Internet
Treasure.

So to kick things off, let's have some stats.

In the past month, I've requested revocation of over 2,800 certificates and
pre-certs[1], across 11 different "CA organisations" (I'm counting one "CA
organisation" as any number of issuing CAs that share a common problem
reporting address).  These certificates were using a total of 1,217 distinct
private keys.  These keys come from multiple sources, but based on an
analysis of a sample of around 3% of those keys, the overwhelming majority
come from GitHub repositories which were at one time -- and in many cases
still are -- publicly available.

As a bit of an aside, at the time of writing, there are a further 52 SPKIs,
representing an unknown number of certificates, for which I have yet to
request revocation from the relevant CAs.  These are keys which have entered
the pwnedkeys database since around the 23rd March.  In addition, since the
23rd, I've automatically requested revocation of 17 certificates (from 16
SPKIs) through Let's Encrypt's automated ACME-based revocation API (and also
deactivated about eight Let's Encrypt accounts whose private keys were posted
publicly...)

An interesting thing to do is to compare "issuance volume" against
"disclosed keys".  This isn't a reflection of the CAs themselves, because a
CA can't control what their customers do with their keys.  Given the
differences in issuance methodologies, target markets, and business
practices between CAs, it's worth taking a look at different CAs'
"disclosure rate", I guess you'd call it, and consider what impact, if any,
tho differences between CAs' operations might have on the likelihood of
their customers disclosing their keys.

I've taken issuance volume as being the total number of unexpired
certificates (as of a few days ago) issued by a "CA organisation" (again,
all the issuing CAs that share a common problem reporting address). 
Pre-certs get in the way, unfortunately, but it's not trivial to say "only
count a pre-cert if there isn't a corresponding cert" in crt.sh, so we have
what we have.

Of the 11 CA orgs that I sent at least one revocation request to, here are
their numbers:

CA org          SPKIs           Issued          Issued / SPKI

Digicert        832             34029920        40901
QuoVadis        23              73184           3181
GlobalSign      47              1650873         35124
DFN-Verein      3               52945           17648
GoDaddy         38              4264928         112234
Sectigo         128             41718165        325923
Entrust         6               576093          96015
SECOM           1               118748          118748
Certum          5               329047          65809
Let's Encrypt   133             122438321       920588

I took the opportunity, also, to look at a couple of larger CAs (by issuance
volume) which have had zero certificates with keys I found: pki.goog (issued
1284676), and Amazon (issued 2308004).  Clearly, the best approach to
avoiding key disclosure is never giving the subscriber the key in the first
place...

Finally, I thought it worth checking as to how many certificates were issued
after the private key was already known to have been compromised by being
included in the pwnedkeys database.  Based on the apparently common
behaviour of "request cert, then stick cert+key into a public GitHub repo",
I didn't think there would be many of these.  Turns out I was insufficiently
pessimistic:

> Found 931 certificates across 90 keys that were pwned before the cert was
> issued.

Again, certs+pre-certs make the apparent cert count bigger; the important
thing is that 90 keys -- or about 7% of the total number of customers who
were inconvenienced by sudden revocation and reissuance -- were *already
known* to be compromised before a cert was issued.  Those customers could
have been saved from inconvenience had their keys have been checked for
pwnage prior to issuance.  That seems significant to me.

As to the process of revoking so many certificates, overall I have to say I
was pleasantly surprised at the degree to which CAs were willing and able to
process revocation requests -- especially when, after a small round of
"test" revocation requests, to ensure that CAs seemed able to process what I
was sending, I dropped mass lists of a year's worth of accumulated keys on
CAs.  I know that CAs don't get to hear a lot of positive things from the
wider community, so consider this one of those rare occasions when someone
says something nice about y'all.  <grin>

However, not everything went smoothly.  Many of the issues weren't
contraventions of the current rules, but rather places where the rules could
definitely stand a bit of a tune-up.  On the couple of occasions when I felt
that a CA's actions were not in line with the BRs or Mozilla policy, I
submitted a detailed description of the facts of each case to this list. 

This e-mail and its followups are not intended to call out any particular
CA; I am intending here to speak to the "systemic" issues that I encountered
while on my quest.  Where I do mention a specific CA, it is simply because
it is a lot harder to follow if I keep writing "CA Betty" and "CA Fred"
everywhere.

I've managed to group the problems into three areas: Problems Reporting
(difficulties in even getting the reports to the right place), Revocation
Shenanigans (things that did not go according to plan during and after the
revocation process itself), and Interaction Intransigence (a few things that
I noticed in my dealings with CAs that would be really, *really* good if
they could be improved).

First off, in reporting problems, the biggest hurdle was figuring out where
to send the reports to.  CCADB contact details (as exposed by crt.sh's
"Report a problem with this certificate" link) were a good start, but not
all CAs had an e-mail address in there, and in some cases the contact form
that was linked to didn't look particularly "problem-report-y" -- it could
very well have gone straight to the sales team by the look of it.  I will
write up a separate proposal on the specific improvements I think could be
made around problem reporting contact details.

Because CCADB contact details aren't "binding" on CAs, I had to check each
contact e-mail address with the CPS in any case, and that showed that
finding a CPS was, in most cases, harder than it needed to be, and finding
one in a language I could understand was sometimes an adventure.  I will
write up a separate proposal on some possibilities around how identifying
the relevant CPS given a certificate, and another on improving clarity
around English language versions of CPSes.

Also, as something of an aside, I noticed several CPSes did not mention
anything about third parties being able to request revocation in s4.9.2.  I
know, and we here all know, that the BRs trump a CPS, and the BRs say anyone
can do it, but I don't think a CPS should be saying things that are just
plain wrong, especially when it's not hard to do it right.

Moving on to revocation shenanigans, it is gratifying to be able to note
that there were no outright, blatant *failures* of revocation.  No CA
spectacularly dropped the ball, or completely failed to revoke masses of
certificates.  What there was, though, were a number of sub-optimal
interpretations and implementations of the BRs, leading to outcomes which
are clearly not in the best interests of the ecosystem.

Starting at the moment of reporting, it seems that there is a disconnect in
the interpretations that CAs and other parties in the ecosystem have
regarding when the 24 hour deadline for revocation starts.  I agree that the
wording in the BRs could be clearer, and I'll be writing up a proposal for
how I think those sections could be improved.

Once revocation was processed, I sometimes noticed a rather surprising delay
in the availability of the revocation information, especially in OCSP
responses.  Frankly, I'm a bit surprised that this happens, given that to
me, the requirements of s4.9.5 of the BRs (around "published revocation")
seem fairly clear.  I am planning on gathering more data around this issue
before I make any concrete proposals.

One of the things that I found quite frustrating was the "revocation
whack-a-mole" that I had to do -- having to send repeated revocation
requests for a given key to the same CA.  Issuance of certificates with
previously-reported private keys is not currently a BR violation, however
failing to revoke those certificates within 24 hours after issuance is
considered a violation.  Since it requires a... shall we say, "nuanced"... 
reading of the BRs to discern this requirement, it is not surprising that
some CAs have not abided by it.  I will be writing up a proposal to clearly
forbid issuance of certificates for private keys which have previously been
reported to that CA as compromised, so that this issue won't happen in the
future.

I also noticed one case of "CA agility" from a compromised key recycler
(there may be others that haven't caught my attention).  After having had a
certificate revoked from underneath them, an enterprising individual got a
certificate from a different CA for the same key.  To be clear, in no way is
this a violation by either CA -- no amount of tortured interpretation could
conclude that the BRs or Mozilla policy has anything to say about this
situation.  Nevertheless, it demonstrates the lengths to which enterprising
people will go to implement terribly bad ideas.

I hesitate to make a proposal that CAs should be required to refuse issuance
for a key known to another CA to be compromised, as it may be interpreted as
an overt attempt to "spruik" the service I run, but it does seem that some
way to prevent a known-compromised key from travelling between CAs would not
be entirely without merit.  It would certainly be useful if there was some
way that CAs needed to accept notifications that "hey, this key could turn
up, and it's compromised".  Although as has previously been mentioned, there
are resource exhaustion issues to be considered in such a mechanism.

An occurrence I was *not* expecting was coming across a valid certificate
using a Debian weak key.  I generated a large number of weak keys to
populate the pwnedkeys database not on the assumption that an SSL
certificate would ever be caught, but rather for use in other contexts
unrelated to the WebPKI.  I have been led to believe that a proposal to
improve this aspect of the BRs is already on its way to the CA/Browser
Forum, and I look forward to seeing a meaningful improvement in that
section.

In concert with my (human-mediated) revocation notifications, I have been
sending semi-automated revocation requests to Let's Encrypt, using the ACME
protocol.  This has been extremely smooth and straightforward, and my life
-- and, I presume, the lives of the staff at the CAs I've reported
revocations to -- would be a lot easier if every CA had an equivalent
facility available.  I think this is so useful, in fact, that I have started
coding a program capable of receiving key compromise revocations and
forwarding them via e-mail, which will be released as open source when it is
in a fit state for deployment.  Because of my implementation work, I'm
hesitant to make a proposal to mandate the availability of an ACME-based
revocation mechanism, to avoid any appearances of a conflict of interest,
however I think it would be a valuable addition to the ease of reporting
(and receiving) key compromise notifications if such a mandate were to be
proposed and adopted.

Finally, I noticed a couple of problems which I don't think are "policy
worthy", but I'd like to mention them because they are things that I think
CAs really need to get a hold of.

The biggest thing that made me want to face-palm was CA staff asking for a
copy of the private key when a key compromise is reported.  Seriously, stop
this, *please*.  If I were a miscreant looking for juicy intel, a CA's
problem reporting inbox would definitely be something I'd be all up inside. 
Inviting -- nay, *instructing* -- people to e-mail private keys to such a
tasty, tasty target is not cool.

Also, please bear in mind that the person reporting a compromise to you
probably isn't getting paid for doing so, and is under no obligation to help
you out beyond providing suitable evidence to prove compromise.  I had
several CAs ask for any further info I had about the keys I had reported,
because in many cases these were significant web properties whose keys had
been compromised, and they were keen to determine how the compromise
occurred.  When I had the time, I shared what information I had, once I'd
confirmed that the certificates were revoked and/or the location of the key
was no longer available.  In most cases, I got some sort of thanks.  In
other cases, I got radio silence.  Guess which CAs won't be getting any more
of my time in the future, to provide them with information that solely
benefits them and their paying customers?

So anyway, that's everything I've got for now.  I hope that at least some
people found it interesting.  I'll start posting separate thread-starting
e-mails about specific things that I think need to be changed next week.  In
the meantime, feel free to ask questions, kibbitz about my methods, or
otherwise comment upon what I've written up.

- Matt

[1] It is hard to give an exact number because filtering pre-certs vs "live"
    certs isn't trivial The vast majority of the certificates had both a
    certificate and a precertificate in the CT logs, so the actual number of
    "real" certs revoked is a little over half that number.  However, in
    some cases only a precertificate was found in CT, and in other cases
    only a live certificate was found, so the number of "live" certificates
    is definitely greater than half of the total number in my count.

_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

Musings on mass key-compromise revocations

Reply via email to