By way of introduction, my perspective is primarily that of an ACME
client developer, so you'll notice my bias toward simpler client
implementations as much as possible. However, I also am a web server
developer (the Caddy Web Server), so I can also appreciate the concerns
of server developers.
First, thanks to Roland and Jacob for submitting such a well-crafted proposal.
It is easy to read, understand, and it is mindful of certain complexities and
unknowns that will need further discussion.
The proposal suggests two problems that it attempts to solve:
1. Notifying subscribers of impending revocations
2. Scheduling regular certificate renewals
I do think both of these can be problems, but I am not sure if this proposal --
or any ACME extension, for that matter -- is the best solution to them.
## Impending revocations
In terms of trust, what is the difference between knowing a certificate is
going to be revoked soon, and a certificate that is already revoked? In a
binary sense, if you know a certificate is going to be revoked, it's as good as
revoked. Why should you continue to trust a certificate when the CA already
knows it shouldn't continue to be trusted?
The proposal treats this endpoint as non-confidential, so we can assume the
CA-suggested renewal windows are public information, just as OCSP responses
are. Given that some vendors are already shipping their own revocation lists to
their clients ahead of CRLs, it's quite likely that some relying parties may
even use the proposed endpoint to get ahead of OCSP and CRLs and apply its
information toward a trust decision.
Fundamentally, the proposed extension isn't too different from OCSP already:
it's a (signed? unsigned?) response from the CA that tells you whether the
certificate is still believed to be trustworthy.
Before going too deep into implementation details, I think the philosophical
paradox this proposal introduces should be resolved.
## Scheduling certificate renewals
I have written a lot of code that renews certificates. The proposal mentions
that there are two main ways to schedule certificate renewals: 1) run a
timer/cron at static intervals, or 2) choose a renewal time based on the
certificate's actual NotBefore and NotAfter dates. I would add at least a third
way, which is what Caddy/CertMagic does: 3) scan all managed certificates at
short, frequent intervals, and if a certificate's lifetime is N% spent,
initiate a renewal right then. This is similar to (2) mentioned in the
proposal, but with a subtle difference: it's much simpler in that it doesn't
require setting a timer or scheduling each certificate individually, but you
still get the benefits of (2) and no downsides of (1). Method (3) also does not
require sleeping/making reservations, which is difficult to preempt.
The downside that the proposal seems concerned with is "load clustering
for the issuing CA" -- I read that as "thundering herd"-type problems. This is
obviously a problem with (1), but for methods (2) and (3):
1. Staggering the start of ACME clients should disperse this load naturally. In
other words, not all ACME clients will start their poller/scanning routine at
the same time if they are duration/interval-based. Clients should avoid using
wall-clock times like "minute 30" or "hour 12" for the same reasons (1) should
be avoided.
2. As certificate lifetimes get shorter, the herds will thunder no matter how
staggered they are.
If the problem of load clustering is really the crux of this, then is it
possible for ACME servers to reply with a Retry-After header on
existing endpoints if they are getting overwhelmed?
## Optional extension
This extension is very helpful for attentive, responsible clients. But for ACME
clients that are... I'll say "minimally implemented"... they may not take
advantage of this endpoint, and unfortunately, it's those clients which will
need it the most.
## OCSP stapling sorta works
For the record, a case study: Caddy/CertMagic wasn't impacted by the recent
Let's Encrypt revocation event because it attempts certificate renewal
immediately upon discovering a "Revoked" OCSP status. (It staples OCSP to all
certificates by default, caches the responses to disk, and keeps them refreshed
about 1/2way through their lifetime.) When this happens, it does not staple
that response to the current certificate -- which keeps its current Valid
response for ~3 more days, while CertMagic attempts renewal. After renewal
succeeds, the certificate is replaced, with a fresh new OCSP staple of course.
No relying party ever sees a Revoked certificate, even with immediate
revocation.
The point is, I think existing infrastructure can work for this problem.
## Vision
In my opinion, the burden is on the clients to just be a little more
fault-tolerant. They should staple OCSP responses. They should do so
conservatively. They can call `renewCert()` when they see a Revoked response.
Ultimately, revocation is just the means to an end: short certificate lifetimes.
_______________________________________________
Acme mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/acme