On Fri, May 12, 2017 at 06:42:20PM +0200, Daniel Schneller wrote: > > That said, given that we can already look up a cert based on a name, > > maybe in fact we could load all of them and just try to find a more > > recent one if the first one reported by the SNI is outdated. I don't > > know if that solves everything there. > > > It actually might. In the end it would be something like a map, with the > key being the domain, and the value a list of pointers to the actual > certificates, sorted by remaining validity, having shortest first.
That's already what is done in the SNI trees, except that the validity date is not considered, the first one matching is retrieved. > I think it would benefit Let's Encrypt and similar scenarios. I would > still require reloads to pick up newly added certificates. But as renewed > certificates overlap their predecessors' validity period, dropping them > into a directory and just doing a reload maybe once a day would work. > Clients would still get the older one, until it finally expired, but that > should not matter, as we are not talking about revocations where > switching to a new cert is wanted quickly. Using the old one "until it expires" is what really causes me a problem (and I understand that in your case that's what you need). There are several reasons for prefering the latest one instead : - it might provide stronger algorithms - it might use a CA which is not being blacklisted (remember that people started to complain about haproxy.org causing them some warnings because the CA was considered unsafe) - it was issued in the past (minutes, hours, days) so is likely already valid regardless of any small time shift. Using the old one one minute past its validity date will be a big problem. - the change will be effective at the moment of reload, meaning that any surprize like an incomplete chain, incorrect OCSP, key size incompatible with certain browsers, will be identified at an expected moment and when it's not too late to fix it. By using the oldest one as long as possible, it would break at any time in the middle of the night and would do it once you cannot roll back. And that's the point. Users praise haproxy's reliability but in fact it's not (just) the code's reliability (git log --grep BUG shames us), but the fact that it has always been designed to be used by humans, who make mistakes and who want to spot them very quickly and to fix them before they become a big trouble. Config warnings/errors, checks for suspicious constructs and logs are directly involved here. And we do know that our users occasionally fail and we must help them recover, and even possibly cover their mistakes before the boss or the customer has any chance to notice. So creating something designed to fail by default in their back without prior notice and without the ability to quickly stop before anyone notices is contrary to the philosophy here. That doesn't mean that what you need must not be implemented, it means that under no circumstance it should be the default nor happen to be enabled by default. Thus I think that at minima if we ever go in that direction, the default behaviour must be the expected one (ie: use the most recent valid cert), and maybe there could be an option to prefer the old one instead and to apply a date margin (eg: avoid using this one if there's less than a day left). (...) > PS: This is an interesting discussion, and I am happy to continue > it, if anyone feels the same. I would not be surprized if we get some followups in either direction. Over the mid term, more and more people will be affected by related situations and the whole aspect of cert renewal will eventually become hot. But I strongly doubt we'll do anything for this in 1.8, though collecting views, ideas and constraints can be useful to try to serve everyone the best later. > As I said, I will try to solve this via > provisioning scripts in the meantime, so there is no time pressure. That's perfect! Your feedback and possible trouble in doing this will also definitely help! thanks, Willy