On 06/12/2017 05:25 PM, Stefan Eissing wrote: > I talked to the people orignally writing our ssl OCSP code regarding > feedback we got from the Let's Encrypt server outage [1]. We agreed > that some valid points for improvement were raised and we need a > discussion about what should be done about it, here. > > I identified the following points so far: > > 1. Hand out existing responses until expired
I guess the core mistake we do today is that we expire the entries in the cache after SSLStaplingStandardCacheTimeout. But we should keep them in the cache as long as they are valid (so either whats in the next update field of the response or this update + SSLStaplingResponseMaxAge). Instead we should have a refresh parameter that I would set as percentage of the expired time (so between this update and next update or as percentage of already expired SSLStaplingResponseMaxAge). Once this refresh time is passed OCSP responses should get refreshed by a background job (possibly implemented by mod_watchdog). > 2. Persist responses (is this just a config/default issue?) This could become tricky given the various so cache implementations we have. I could only think of persisting the whole cache when Apache is shutdown. > 3. Start update responses at server start/regular intervals What I want to avoid is that the server "hangs" at start because of a "hanging" OCSP server. I admit that this can happen already today on the very first SSL request with stapling turned on, but IMHO this is a bad behavior. So either just do the stuff on a regular basis in the background and do not staple if there is no valid OCSP response yet (I know Hanno won't like that :-)) Or have an initial (valid) OCSP response being loaded from a file during startup. It would be up to the admin to fill this file with a valid OCSP response before it starts httpd. > 4. Use something better than HTTP/1.0 requests What issues do we have with the HTTP/1.0 requests? > > I think 1) should be not too complicated code changes without > any big restructuring. I saw Ruediger already doing some changes. > > The reason for 2) is not clear to me. Is this just a configuration > issue to have a persistent cache or is our giving up privileges > limiting here? > > As to 3, starting a task at server start or after a certain interval, > do we have some infrastructure for this? Do we need something new? > > On 4, it seems, we lack a good http(s) client. The code we use > for proxying is not easily reused for new connections, or? I see > more need for such a thing in the near future. > > Feedback appreciated. Regards Rüdiger
