Re: haproxy 1.6.0 crashes

Willy Tarreau Tue, 20 Oct 2015 05:00:53 -0700

Hi Christopher,

On Tue, Oct 20, 2015 at 01:32:57PM +0200, Christopher Faulet wrote:
> >But then how do we know that an SSL_CTX is still in use when we want to
> >evict it from the cache and that we must not free it ? Is it just the
> >fact that between the moment it's picked from the cache using
> >ssl_sock_get_generated_cert() and the moment it's associated to a session
> >using SSL_set_SSL_CTX() it's not possible to yield and destroy the cached
> >object so no race is possible here ? If so I'm fine with it for now (though
> >it will become "fun" when we start to play with threads), I just want to
> >be certain we're not overlooking this part as well.
> 
> This is not an issue because when we get (or create) a SSL_CTX object 
> then it is associated to a session, without any interruption. So it 
> cannot be evicted from the cache in the middle.
> After this step, the refcount is >= 2, so, if the SSL_CTX object is 
> evicted from the cache, the refcount is decremented and the SSL_CTX is 
> not released. It will be automatically released with the closure of the 
> last SSL connection using it.


OK.

> But, now this works for a non-threaded environment. Is there any plan to 
> add thread support? If yes, this feature will not work.

Sure, we expected to be able to make some progress towards this in 1.6,
so now this is postponed to 1.7. We're at least trying not to make the
situation worse than it currently right now :-)

> >Also that raises another point : if the issue is related to SSL_CTX_free()
> >being called on static contexts, then to me it means that these contexts
> >were not properly refcounted when assigned to the SSL. Don't you think
> >that we shouldn't instead do something like the following to properly
> >refcount any context attached to an SSL and ensure that the SSL_CTX_free()
> >can always be performed regardless of parallel activities in the LRU tree
> >or anything else ?
> >
> >                 /* Alloc a new SSL session ctx */
> >                 conn->xprt_ctx = 
> >                 SSL_new(objt_server(conn->target)->ssl_ctx.ctx);
> >+            SSL_set_SSL_CTX(conn->xprt_ctx, 
> >objt_server(conn->target)->ssl_ctx.ctx);
> 
> This last call will have no effect. Because the SSL_CTX is the same, 
> this function returns immediately. Note that if the context changes, the 
> refcount of the old one is decremented.

OK so the cert's refcount is already incremented by SSL_new(), then
I don't understand why the SSL_CTX_free() fails in ssl_sock_close()
since it should only decrease the refcount from what I understand. Or
maybe it is just because we're not allowed to call it twice (which
makes sense to me) ?

> But there is no issue here, because the static contexts are only 
> released when HAProxy is stopped.

Sure but I thought we should ensure that all SSL_CTX are properly refcounted
instead of handling some of them one way and the other ones another way. I
mean, if openssl provides refcounting for us, better use it globally than
just for certain certs.

> Here is live cycle of static contexts:
> 
>   - HAProxy is started, static contexts are initialized by calling 
> SSL_CTX_new (refcount is set to 1).

OK.

>   - SSL connections use these contexts. SSL_new or SSL_set_SSL_CTX are 
> called to assign a context to a SSL object. The refcount is incremented 
> by 1 each time. When a SSL connection is closed, a call to SSL_free is 
> done to release the SSL object and the refcount of the associated 
> context is decremented. So the refcount is always greater or equal to 1.

OK.

>   - HAPRoxy is stopped, all connections are closed, and finally, static 
> contexts are freed by calling SSL_CTX_free. The refcount is equal to 1, 
> so when SSL_CTX_free is called, it reaches 0 and the contexts is freed.

OK. My understanding here (and what you already explained) is that it's
really freed once it reaches zero, either upon SSL_CTX_free() or upon
SSL_free(). So for certs belonging to the cache, in fact we have one
call to SSL_CTX_free() then one to many calls to SSL_free().

> >I'm having an issue here as well since the LRU's destroy callback is set
> >to SSL_CTX_free. This we start with a non-null refcount. I'm sorry if I am
> >not clear, but the problem I'm having could be described like this :
> >
> >   - two sets of entities can use a shared resource at any instant : cache
> >     and SSL sessions ;
> >   - each of them uses SSL_CTX_free() at release time to release the 
> >   object ;
> >   - SSL_CTX_free() takes care of the refcount to know if it must free or 
> >   not,
> >     which means that these two entities above are each responsible for one
> >     refcount point ;
> >   - the SSL_CTX_free() called by the cache is unconditional when the 
> >   object
> >     is evicted from the cache ;
> >   - the SSL_CTX_free() is only done if the cache is enabled ;
> 
> This last step is wrong. SSL_CTX_free is only done if the cache is 
> _DISABLED_ or NULL if you prefer. SSL_CTX_free is called when the object 
> is evicted from the cache _OR_ when the cache is disabled. So everything 
> works fine.

OK now I understand this one thanks to your explanation. You want to see
*exactly one* call to SSL_CTX_free() for each and every certificate,
correct ?

Then my understanding is that we should instead proceed differently :
  - the cert is generated. It gets a refcount = 1.
  - we assign it to the SSL. Its refcount becomes two.
  - we try to insert it into the tree. The tree will handle its freeing
    using SSL_CTX_free() during eviction.
  - if we can't insert into the tree because the tree is disabled, then
    we have to call SSL_CTX_free() ourselves, then we'd rather do it
    immediately. It will more closely mimmick the case where the cert
    is added to the tree and immediately evicted by concurrent activity
    on the cache.
  - we never have to call SSL_CTX_free() during ssl_sock_close() because
    the SSL session only relies on openssl doing the right thing based on
    the refcount only.
  - thus we never need to know how the cert was created since the
    SSL_CTX_free() is either guaranteed or already done for generated
    certs, and this protects other ones against any accidental call to
    SSL_CTX_free() without having to track where the cert comes from.

What do you think ?

Thanks!
Willy

Re: haproxy 1.6.0 crashes

Reply via email to