We are happy to have more participation in the acme working group. The IETF is based on development of standards by rough consensus. If you are willing to roll up your sleeves and participate (by reviewing/commenting on the drafts, and contributing to discussion) we are happy to have you. It is not called the 'acme server' working group. The working group is only as sleepy as we make it.
I will say that reading pages of a single message serves only to bury the lead. Crafting opinions that are clear and concise get quicker results. We are all busy people. Deb Cooley [email protected] Co-chair of acme On Fri, Jun 23, 2023 at 1:44 PM Michael Sweet <msweet= [email protected]> wrote: > FWIW, I agree with Matthew's comments and conclusions. > > In a somewhat-related situation for printing, we have an event > notification interface (RFC 3996) where the printer can report back a time > interval (in seconds) when the client should re-contact the printer to get > more events. This is flexible enough to handle both printer/server load > and to let the client now when it should anticipate more events, i.e., the > printer is printing something, the event subscription is for > 'job-completed', and the printer can estimate when the print job will > complete - this is analogous to an ACME certificate's expiration/renewal > date/time. > > Personally, the servers I maintain use Let's Encrypt and have a weekly > cron job that checks whether the server's certificate needs to be renewed. > If the ACME server could provide a "retry after" response then my servers > (ACME clients) could do a better job of scheduling the next update and not > bug the ACME server so often... > > > > On Jun 23, 2023, at 12:20 PM, Matthew Holt <[email protected]> wrote: > > > > Hi all, > > > > I don't normally participate in these mailing lists, and last time I did > I feel like the lack of discussion was discouraging, as what little > discussion did occur wasn't taken seriously and was laced with complacency. > Just stating up front that I don't have much hope for this message to be > acted upon. That said, multiple people have strongly encouraged _someone_ > to write the mailing list and bring the concerns of multiple ACME client > developers to your attention. > > > > I speak for myself, but my views have been formed from a combination of > personal experience developing ACME clients and discussion with other ACME > client developers. So when I say "we" I do so loosely; sometimes it might > just be me. > > > > First, I want to say: overall we like the idea of proactive ACME clients > being able to know whether a certificate needs to be replaced sooner than > expected, and we're glad to see an attempt at a solution drafted for > standardization. But some of us do not think (current draft) ARI is The Way. > > > > Now that several ACME client authors have had the opportunity to > implement the spec, we've noticed some issues, both with fundamental flaws > in the concept of ARI and some in implementation. Initially these concerns > were raised at the Let's Encrypt forums: > > > > - > https://community.letsencrypt.org/t/can-ari-conforming-clients-be-granted-exemptions-to-relevant-rate-limits/195600?u=mholt > > - > https://community.letsencrypt.org/t/thoughts-from-starting-to-play-with-ari/200276?u=mholt > > - https://community.letsencrypt.org/t/ari-rate-limits/198720?u=mholt > > - > https://community.letsencrypt.org/t/ari-retry-after-header/195471?u=mholt > > > > And the overwhelming response seems to be, "Meh, take it to the mailing > list." (Except for one response by LE staff about rate limits, which was > appreciated, at least.) So here we are. > > > > Cutting to the chase: > > > > With respect to ARI, ACME servers and clients have conflicts of > interest. The ACME client's goal is to keep the site up (with renewed and > unrevoked certificates); the optimal way to do this is to start renewing > early and retry often. The ACME server's goal is to keep the service up; > the optimal way to do this is to suppress clients that overload your > capacity. Obviously, these two goals are in opposition with each other. > Proactive clients can spike demand, which can cause service interruptions. > But service interruptions make clients more paranoid to retry even more > often until it works, and so on. ARI narrows the timeframe in which a > conforming client can retry failed renewals, which reduces reliability more > as time goes on. Without ARI, this window is a reasonable ~60 days. With > ARI, however, the window is reduced to just a few minutes, hours, or days. > The less time until expiration, the less hope there is to renew the cert in > time. As the draft currently stands, this is in the server's interest, but > not the client's. > > > > I can tell you, with the current draft, my ACME clients will use ARI as > a signal to immediately try renewing a certificate, not for scheduling a > renewal in the future. > > > > Here's why. > > > > The ACME client's goal is to keep the site up (with renewed and > unrevoked certificates). If everything always worked, we'd simply renew > after about 99% of the certificate's lifetime. > > > > But obviously, that's not reality. In the presence of > failures/uncertainty, the optimal way to maximize uptime is to start > renewing early and retry often. In fact, just constantly be renewing. This > offers the maximum possible chances to successfully get a certificate. > > > > But obviously, that's not reality. CAs rightly enforce rate limits, and > service uptime is actually Pretty Good most of the time, so we can reduce > network traffic, load on the CA, and pressure on CT infrastructure by > waiting until about 2/3 into a certificate's lifespan before trying to > renew. (With Let's Encrypt certificates this gives 30 days of runway.) This > is a fair balance and works well in practice. > > > > But unfortunately, reality's not that simple. There are two off-nominal > events that are often mentioned as the motivation for ARI: > > > > 1) Revocation > > 2) Traffic smoothing around expected maintenance or heavy load > > > > Both of these can interfere with our happy little status-quo. Revocation > means we need to replace the certificate sooner than expected, and > maintenance or congestion means we may need to renew the certificate later > than expected. > > > > Enter ARI. ARI is the CA saying, "We suggest -- but do not require -- > this specific timeframe within which to renew your certificate." > > > > There are some problems with this: > > > > 1) It is optional. No one will implement this. OK, some clients will -- > but I can say with authority from years of experience that optional > restrictions are not typically favored. Very little mainstream software > follow best practices to a tee. > > > > 2) A narrower renewal timeframe makes clients less reliable. In theory > it should make them *more* reliable since it smooths out traffic, thus > improving CA availability. But this assumes that most clients actually > implement and follow ARI. Since it's optional, I don't see that happening. > Especially since most ACME clients are still running as static cron jobs > like it's 2015... > > > > I'm sure ARI doesn't really change in the nominal case, which is > 99.9..9% of the time. In fact, Let's Encrypt's ARI seems to correspond with > when my clients attempt renewals on their own anyway. (So in that sense, > ARI is actually useless 99.9..9% of the time?) > > > > But when a renewal window does change, what does that mean? Well, > something is wrong. Either the certificate is being revoked, or the CA > anticipates downtime or availability issues. > > > > Uh oh. That's bad news for a good little client which is trying its best > to keep its sites (potentially tens of thousands of them) online. > > > > If we wait until the (adjusted) window to start renewing, we run > ourselves closer to the imminently-impending revocation or the expiration > of the certificate, lowering our chances of a successful renewal. If this > is a mass or CA-wide event, other clients have surely noticed too. Best to > renew ASAP and give ourselves more chances for success. Worst-case > scenario, we'll retry all the way into the designated window in which we > expect to be able to get a certificate anyway. And we might have to do this > for 10s of thousands of certificates. > > > > Because ARI is optional, it only acts as an early warning for clients > that wish for an advantage over other clients with the same goal when > resources are scarce. In these conditions, it's first-come-first-serve and > clients compete to preserve uptime for all their sites. (I think clients > can still do this respectfully with backoff and jitter.) > > > > Note that this behavior is still in compliance with the draft ARI spec, > which says: > > > > Conforming clients MUST attempt renewal at a time of their choosing > > based on the suggested renewal window. > > > > It doesn't say the renewal MUST be attempted "within" the window, just > "based on" the window. (A minor language change to the spec, by the way, > will not change client behaviors. I think we need to take a different > approach to ARI, read on.) > > > > Anyway, a few more practical issues/questions: > > > > 1) Many CAs enforce rate limits. If clients are to honor ARI windows, we > would need a guarantee that the first successful cert within the ARI window > will be allowed regardless of relevant rate limits. Because ARI restricts a > client's ability to spread out renewals when managing certificates in bulk > with respect to rate limits, the rate limits must NOT be a blocker when > honoring ARI. > > > > 2) If ARI were actually enforced, some concerns would be resolved... for > example, we can have assurances that other ACME clients are doing the same, > thus improving CA availability. It would essentially be the CA scheduling > each individual certificate for each ACME client instance -- that's quite a > powerful idea, as long as availability is guaranteed (which it's not). > > > > 3) ARI does not scale well. Some ACME clients manage 10K+ certificates, > and in that case the client would have to check the ARI for at least 24 > certificates per hour to get through them in a month. Deferring to the > Retry-After header may result in insufficient throughput. The current > expectation or convention is to check every certificate every 6-12 hours, > or tens of thousands of checks per day. One endpoint per certificate > multiple times per day is quite saturating. This is a considerable burden > for both ACME clients and servers. I would like to explore options that do > not involve 2+ HTTP requests per certificate. > > > > 4) Crafting the URL is convoluted. As Peter Cooper described it, "The > core issue is that the URL you need to construct is based on an OCSP > structure identifying the certificate, which requires taking one's existing > certificate and parsing out the serial number and issuer, and also taking > the intermediate certificate that signed it and getting its public key too. > So rather than just, like, using the fingerprint of the existing leaf or > something similarly simple that a lot of tooling can already give you, one > needs to really dig into both the leaf, and the intermediate, and hash > various pieces thereof, and then take all that to build a new ASN.1 > structure." Why are we striving for near-parity with an OCSP request?? This > should be orthogonal to OCSP, right? > > > > 5) Web browsers / HTTP clients are bound to "abuse" ARI because the GET > request is not authenticated. Even if the information is not strictly > sensitive, I can totally see some browsers or tools using ARI as a signal > that a certificate is being revoked, and thus can no longer be trusted, and > thus block a site before a server even sees that it needs to renew its > cert. I could be incorrect, but can't the information needed to obtain ARI > can be scraped from CT logs? If so, I think a global ARI monitor/database > is inevitable, and that has interesting implications that I don't know have > been fully realized. > > > > All in all, the current ARI spec feels a little rushed. I'm hoping Let's > Encrypt's production deployment is meant to help gather feedback about ARI > before finalizing it, rather than to solidify it. Can we revisit both its > fundamentals and practical implications too? > > > > I would like to explore some alternatives to the current draft. I can > think of two approaches that might address these concerns: > > > > A) Instead of a totally separate flow to obtain ARI, simply utilize a > Retry-After header in the flow of existing ACME responses. Upon finalizing > an order, the ACME server can respond with a Retry-After header which acts > as the current-draft Retry-After header for ARI responses. The client then > attempts renewal at/after the Retry-After time, but with the OCSP CertID > added to the NewOrder object; this indicates to the ACME server that the > client is asking if now is a good time to renew the certificate indicated > by the CertID. If it's not a good time, the ACME server can reply as such, > with another Retry-After, and the client then waits and repeats, until the > server actually issues the certificate. If the client needs the certificate > immediately, simply omit the CertID from the NewOrder and the normal, > "non-ARI" flow is assumed. This is backwards-compatible and requires no > additional infrastructure or endpoints. > > > > B) If we do need a separate flow for some reason, I would like to see a > single endpoint containing a static JSON resource that describes all the > active certificates that need early renewal, rather than one > tediously-crafted URL per certificate. Certificates can be described by > their NotBefore or NotAfter dates, serial numbers, or other relevant > attributes. For example, if just a few certs with certain serials were > misissued, those serials could be enumerated at this endpoint. Or if a mass > revocation is happening, the timeframe of NotBefore dates could be listed, > and ACME clients can simply check against the certs they manage with those > dates, and replace them. You can represent millions of certificates in, > like, 85 bytes this way. And it's way less work for clients and servers. > And lastly, drop the "window" idea -- certificates described by this > endpoint should be renewed ASAP: try to renew immediately, then back off > and retry, for reasons described above (once we know the future is > uncertain and/or revocation is imminent, current certs can't be trusted > and/or clients must try to preserve their sites' uptime). > > > > And finally, I want to bring attention to the longer-term prospects for > ARI: it's quite possible that ARI will become irrelevant before it is > widely adopted by most clients. This itself may discourage adoption. As > stated above, ARI has two primary use cases: revocation and traffic > smoothing. As we push for shorter certificate lifetimes, revocation should > become irrelevant. And traffic smoothing will perhaps become a natural > consequence as clients are renewing more frequently anyway. We all know > revocation and long-lived certificates are broken, so I'd rather WebPKI > developers focus our energy on the ACTUAL goal: short-lived certificates. > We should not be focusing our ecosystem resources on infrastructure that > acts as a band-aid for a broken leg. > > > > That said, I'm not opposed to the general idea of a renewal hint for > clients in the meantime as long as it's simple, makes fundamental sense, > and is actually effective. I think the issues described above are mostly > solvable and now hopefully we can get there from here. > > > > _______________________________________________ > > Acme mailing list > > [email protected] > > https://www.ietf.org/mailman/listinfo/acme > > ________________________ > Michael Sweet > > _______________________________________________ > Acme mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/acme >
_______________________________________________ Acme mailing list [email protected] https://www.ietf.org/mailman/listinfo/acme
