Re: MRSP 3.0: Issue #276: Delayed Revocation

Matt Palmer Thu, 09 Jan 2025 17:43:45 -0800

On Thu, Jan 09, 2025 at 09:45:26AM +0000, 'Rob Stradling' via 
[email protected] wrote:
> [Posting in a personal capacity, per 
> https://wiki.mozilla.org/CA/Policy_Participants]
> Ben wrote:
> > New CA Obligations:
> > - Maintain and test mass revocation plans annually, including the 
> > revocation of 30 randomly chosen certificates within a 5-day period.
>
> Please note that there are CAs that do correctly understand that
> "Mozilla does not grant exceptions" and that do always strive to
> adhere to the mandatory BR revocation deadlines.


I'm sure all CAs "strive", to greater or lesser degrees of both effort
and effectiveness, to adhere to the revocation requirements.
Nevertheless, issues do occur across even CAs that are presumably
striving hard, as the breadth of CAs represented in the list of historic
revocation failure bugzilla issues highlights.  The list of prominent
CAs who have had a delayed revocation incident appears much longer than
the list of prominent CAs that haven't.  I would not presume to accuse
every CA on that list of not striving to adhere to the mandatory
deadlines at the time of their respective incidents.

> Why should these
> rule-abiding CAs and their subscribers be burdened with this proposed
> random revocation requirement?  This seems unfair, in my view.

In general, anything that does not get exercised regularly tends to
atrophy, and so when it turns out to be needed, it does not perform as
well as would be hoped.

During the time I was actively monitoring for BR violations identified
by the Pwnedkeys Revokinator, I found (and reported) multiple cases,
from otherwise (presumably) rule-abiding CAs.  It was from this
experience that I first came up with the idea of proactive revocation
requests.  Even the best CAs have deficiencies and limitations in their
processes and systems, and it is only by regularly exercising the
processes and systems in a variety of ways will those deficiencies and
limitations be identified, and hence (hopefully) fixed.

That's why I support random revocation requirements for *all* CAs --
because it is practically axiomatic that all CAs' systems will be
less-than-perfect, with problems that have lain dormant, and are only
identified by real-world, end-to-end testing.  And that's even before we
start considering the subscriber-level problems out there...

On the subject of being "burdened" by additional requirements, I'd
respond that all regulation is a "burden", and that's just part and
parcel of participating in a regulated industry.  CA behaviour has
highlighted a deficiency in the regulatory framework, and that is now
being addressed.  If there is a "lighter touch" regulation that could
achieve the same outcomes (identify deficiencies in CA revocation
processes before they turn into massive, real-world problems), I'm sure
Mozilla would seriously consider it.  But "don't enhance the regulations
because they would be a burden to the regulated" is not an effective
means of achieving safe and reliable systems.

> Martijn asked if it would be "fairer to only impose this random
> revocation requirement on those CAs that have actually had delayed
> revocation
> incidents"<https://www.mail-archive.com/[email protected]/msg01951.html>.
> I think that this would not only be fairer but might also act as a
> deterrent against CAs delaying revocations in the first place!

There is definitely something to be said for additional scrutiny on CAs
that have historically had problems with their revocation processes, in
a similar vein to "consent decrees" -- once you've demonstrated a
problem, we're going to watch you a bit more closely for a while
afterwards.  Along those lines, I'd suggest something like:

-----8<-----

In the 12 months following the closure of each delayed revocation
incident report, Mozilla will verify the CA's process and system
improvements by making a (verified in some appropriate manner)
revocation request to the CA, enumerating a selection of certificates or
other relevant identifiers, chosen in a manner at Mozilla's sole
discretion to reflect the characteristics of the original delayed
revocation incident, of the same magnitude of impact as the original
delayed revocation.

The CA will be expected to process all of the listed identifiers within
the BR-required timeframe which applied to the original delayed
revocation, and failure to do so will be considered a delayed revocation
incident like any other.

----->8-----

For example, if a CA delays revocation on about 10,000 EV certs that
were improperly issued due to improper domain-control validation, then
Mozilla would, at a time of its own choosing within the 12 months after
the original incident is closed, email the CA a list of about 10,000 EV
certs issued by that CA and expect to see them all show up in CRLs and
OCSP responses as "revoked" within 24 hours of the email being received
by the CA's problem reporting address.

As another, slightly more convoluted example, if a CA delays revocation on
some number of certificates due to 10 compromised keys, then Mozilla
would pick 10 public keys used in certificates issued by that CA and
require all certificates that use those key to be revoked, and that the
new certificates not use any of the identified keys.

Mirroring the magnitude of the original delayed revocation both tests
the improvements made to handling the volume of the original incident,
and motivates CAs to put effort into minimising the scope of the
original incident, even if they can't prevent it entirely.

Similarly, requiring the same revocation period as the original incident
tests the same "kind" of incident again, on the presumption that a CA
that couldn't meet a 24 hour deadline the first time *might* still have
been able to meet a five day deadline, so testing a different scenario
is not nearly as useful a test as replicating the original scenario as
closely as possible.

Waiting until after the incident report is closed ensures that the test
a "fair" one, in that it isn't hitting a CA while it's still getting the
ship afloat again.  While CAs might try to delay the test by holding the
incident open forever, it'll be compensated for by the poor optics of
having a big list of unresolved incidents -- if an incident's sole
action item was "deploy enhanced certificate linting tool", for example,
and the incident report is still open two years later, one might start
to question the CA's capacity to execute. Eventually that incident's
going to get closed, and if a bunch of big incidents all get closed at
once, that CA's going to have a busy year ahead of them...

I have deliberately made it Mozilla's role to select the certificates
and choose the timing, because as others have mentioned, I have zero
faith that all CAs will honestly select the certificates and timing of
the revocation in a random manner.  It is just too easy to put the thumb
on the scales with practically zero chance of being detected, to think
that *someone* won't do it.

As an aside, if Mozilla would like to outsource the reporting and
analysis of test revocations, I happen to have a high-volume revocation
monitoring service that's just itching to be more widely used...

- Matt

-- 
You received this message because you are subscribed to the Google Groups 
"[email protected]" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/d1b5228a-2aa3-49fe-a3cd-8f667fac5d13%40mtasv.net.

Re: MRSP 3.0: Issue #276: Delayed Revocation

Reply via email to