Since it has come up in multiple delayed revocation events across multiple CAs, I want to push back on the idea that regulators are a valid reason for revocation delays in most cases.
There is no question some countries have regulators that expect to be notified before certificate changes, and that a fraction of those also disallow replacement until the regulating body approves the change (which is my opinion is crazy, but to each their own). But in most of these cases, the reason for this is that they are deployed on, e.g., communication systems between banks, payment terminals, etc. There is no technical reason why almost all of these spaces cannot have their own privatePKI (with different rules than the webPKI). Especially if the reason for the rule is incorporation of the new certificate into other systems -- that means there is pinning or something equivalent going on, in which case you could just as well trust explicitly the self-signed certificate (or whatever suits your fancy). I know that in some cases, the certificates are deployed at endpoints that may be access by payment terminals or other "covered devices" (which means the regulations cover it) while also being web browser accessible, and thus while not directly required to followed the same rules, it has to, and be compatible with the webPKI. But there are technical solutions here too -- the easiest being of course, "make the endpoints for covered devices and the web different!" But there are others, e.g. many CDNs and web proxy software will let you send different certificates depending on the originating device, so you can use a privatePKI certificate (that requires approval and cannot change "fast") for covered devices and a webPKI certificate (that needs agility) all on one endpoint. I think the webPKI should take actions commensurate with encouraging these applications to move to their own PKI hierarch(y/ies), which is to say, view "regulators did not approve it in time," is not a valid reason to delay revocation. In writing this, I have a question: does anyone know if most of the regulations are tied to the certificates, or to the cryptographic keys? My limited perusal seems to suggest that in many cases it is the latter that is tied to notification/approval requirements, in which case a certificate change (but keeping the same keypair) should not impose the regulator requirements at all. But I'd love for people who actually know this space to chime in. Tyrel On Monday, May 20, 2024 at 8:46:00 PM UTC-4 Mike Shaver wrote: > DELAYED REVOCATION IS TOO COMMON > > This is long enough, so I’ll spare readers dozens of links to > delayed-revocation incidents collected in Bugzilla; we all know that pretty > much any other incident that involves misissuance will come with a > delayed-revocation chaser these days. > > In *many* of those delrev (?)incidents, we see a phrase like “we requested > that our subscribers revoke and reissue”. They are not informing their > subscribers as to a fixed revocation timeline, but rather simply asking if > those subscribers if they would please do the revocation process when > they’re able. In one case, I heard of a revocation request from a major CA > that didn’t even have a timeline *suggested*. Of course, the subscriber > gets no value out of replacing their certs: it’s pure overhead, and if > WebPKI were operated perfectly, it would never be necessary. This is an > externality of, most often, a CA’s failure to sufficiently invest in > understanding, implementing, and verifying the processes that they use to > twirl the keys to the entire web’s security. > > Indeed in a number of cases the CAs didn’t even stop issuing once they > realized that they were misissuing certs! Intentionally issuing certs that > are known to be bad, what a world. > > While CAs generally claim that they would be able to handle a mass > revocation incident (such as due to leaked key material), the evidence we > have for CAs aggressively revoking as called for by the BRs and the root > programs is…scant. We’ve seen “it was a long weekend” as a reason for > delaying revocation for certs—including some used by a different part of > the CA’s company! One CA has proposed a “global fire drill” to stress-test > revocation procedures, but we’re seeing revocation timelines reaching > multiple months right now, so…a lot of stuff would end up burning in that > fire. > > CAs also tell us that they advocate and recommend for their subscribers to > implement automation for cert management, but we never see any concrete > targets or success criteria for those efforts, so they certainly seem to me > to just be more “asking nicely”. (I’m not sure that all of the CAs claiming > to be pushing for subscriber automation actually have robust ACME or > similar support yet, in fact.) > > (Some of the CAs made explicit promises years ago to not delay revocation, > some of them issued even though they knew that zlint showed an error—there > are lots of additional twists on simply “issuing bad certs and not cleaning > them up as agreed”.) > > Now, in the wake of these *many* delrev incidents, over years of history, > the root programs have responded with pretty much no consequences > whatsoever as far as I can tell. There’s one case open about Entrust’s > overall behaviour, who are certainly over-achieving when it comes to ways > to get location fields wrong, but they are definitely not the only ones who > treat the BRs’ 1/5-day revocation instruction as instead meaning “when it’s > convenient for the customer”. > > THE QUESTION > > So: what should be done to make revocations of misissued > certificates—sometimes *intentionally* misissued certificates—as prompt as > the BRs require? > > The cost equation for CAs is obviously skewed against the health the web > PKI, if we are to believe that the BRs are important. Once a CA has > violated the BRs and misissued, it is *in their commercial interest* to not > revoke promptly: it causes embarrassment and subscriber frustration, or > even disruption to subscriber services. At the limit it might even lead a > subscriber to change CAs if the reissuance events are frequent and > disruptive enough. > > On the other hand, the more bad certs there are floating around, even if > it’s “only” a matter of a case mismatch, the less interoperable the web PKI > is, and the harder it is for a relying party to make effective use of > WebPKI’s guarantees. Let’s please not end up with a “quirks mode” for TLS > certificate handling! > > SOME OPTIONS > > One option: decide that there really are some BR violations that “don’t > matter”, such that revocation can happen on a more relaxed, accommodating > timeline—or perhaps not at all, just letting them expire as has been seen > in some delrev incidents already. This would mean that we would still see > incident reports that in theory help other CAs learn to put the postal code > in the right field or similar, but subscribers and CAs and root programs > would have to do less work. > > Another option: have affected certificates added to OneCRL after 72 hours. > It would benefit from some automation, but it’s probably feasible to make > relatively smooth. It is sometimes the case, worryingly, that it takes CAs > a fair bit of time and multiple attempts to find all the affected > certificates, so this might require some linter running off CT logs or > similar as a watchdog. > > Another another option: forbid CAs from selling WebPKI certificates into > environments where a) revocation within a 5-day limit is operationally > infeasible, and b) disruption of the related services would cause risk to > human health and safety or similar. There are apparently many organizations > out there which are critical to national economies or whatever, but need > literal Earth months to replace a certificate. These are clearly cases > where the requirements of WebPKI are incompatible with the operational > constraints of the subscriber, so it’s not a good idea to mix them. (I’m > sure some CAs could offer help with private PKI systems, probably with > compelling margins.) > > Yet another, this time somewhat more preventative: if a CA repeatedly > demonstrates that they are unable or (always the case?) unwilling to honour > their commitments to the BRs, impose validity length restrictions on certs > that they issue. At least in that case future misissued certificates would > be in the wild for longer, and it would also show nicely that CAs’ advocacy > for certificate automation was fruitful. Ignoring Entrust’s diatribe > against 90-day validity periods in that weird blog post, I don’t think that > any CA has made a credible case that their customers would not be able to > handle rotating certificates every 90 days, even if they have to carve the > new fingerprint into a mountain using a toothbrush or whatever. They’d even > know it’s coming. > > One more: make delayed revocation incidents, specifically, more visible to > subscribers and potential subscribers, and see if business pressure does > what merely “agreeing legally to follow the BRs” (and optionally making > empty “it’ll never happen again” promises) has been unable to do in too > many cases. > > THANKS FOR READING > > I think the WebPKI is being poorly served by the *realities* of > certificate integrity and misissuance responses. If nothing else, it’s > causing a ton of delrev incidents for Ben to have to shepherd, without even > module peers to assist him. > > Something needs to change. > -- You received this message because you are subscribed to the Google Groups "[email protected]" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/a/mozilla.org/d/msgid/dev-security-policy/4a8d1547-91b4-4fd8-a820-62cb83330181n%40mozilla.org.
