On 06/12/2017 05:25 PM, Stefan Eissing wrote:
> I talked to the people orignally writing our ssl OCSP code regarding
> feedback we got from the Let's Encrypt server outage [1]. We agreed
> that some valid points for improvement were raised and we need a 
> discussion about what should be done about it, here.
> 
> I identified the following points so far:
> 
> 1. Hand out existing responses until expired

I guess the core mistake we do today is that we expire the entries in the cache
after SSLStaplingStandardCacheTimeout. But we should keep them in the cache as 
long as
they are valid (so either whats in the next update field of the response or 
this update
+ SSLStaplingResponseMaxAge).
Instead we should have a refresh parameter that I would set as percentage of the
expired time (so between this update and next update or as percentage of already
expired SSLStaplingResponseMaxAge). Once this refresh time is passed OCSP 
responses
should get refreshed by a background job (possibly implemented by mod_watchdog).

> 2. Persist responses (is this just a config/default issue?)

This could become tricky given the various so cache implementations we have. I 
could
only think of persisting the whole cache when Apache is shutdown.

> 3. Start update responses at server start/regular intervals

What I want to avoid is that the server "hangs" at start because of a "hanging" 
OCSP server.
I admit that this can happen already today on the very first SSL request with 
stapling turned
on, but IMHO this is a bad behavior. So either just do the stuff on a regular 
basis in the background
and do not staple if there is no valid OCSP response yet (I know Hanno won't 
like that :-))
Or have an initial (valid) OCSP response being loaded from a file during 
startup. It would be up to
the admin to fill this file with a valid OCSP response before it starts httpd.

> 4. Use something better than HTTP/1.0 requests

What issues do we have with the HTTP/1.0 requests?

> 
> I think 1) should be not too complicated code changes without
> any big restructuring. I saw Ruediger already doing some changes.
> 
> The reason for 2) is not clear to me. Is this just a configuration
> issue to have a persistent cache or is our giving up privileges
> limiting here?
> 
> As to 3, starting a task at server start or after a certain interval,
> do we have some infrastructure for this? Do we need something new?
> 
> On 4, it seems, we lack a good http(s) client. The code we use
> for proxying is not easily reused for new connections, or? I see
> more need for such a thing in the near future.
> 
> Feedback appreciated.

Regards

Rüdiger

Reply via email to