On Mon, Oct 19, 2015 at 03:06:44PM +0200, Christopher Faulet wrote:
> >OK so the unused objects in the tree have a refcount of 1 while the used
> >ones have 2 or more, thus the refcount is always valid. Good that also
> >means we must not test if the tree is null or not in ssl_sock_close(),
> >we must always free the ssl_ctx as long as it was dynamically created,
> >so that its refcount decreases, otherwise it keeps increasing upon every
> >reuse.
>
> No. Maybe my explanation was not really clear. The SSL_CTX refcount is
> not exposed. It is an internal parameter. So, it is not incremented when
> the SSL_CTX is pushed in the cache tree.
>
> The call to SSL_set_SSL_CTX increases the refcount and the call to
> SSL_free decrements it (when the connection is closed). And, of course,
> the call to SSL_CTX_free decrements it too. The SSL_CTX object is
> released when the refcount reaches 0.
>
> For a SSL_CTX object, SSL_CTX_free must be called once. When it is
> evicted from the cache tree (or when the tree is destroyed) _OR_ when
> the connection is closed if there is no cache tree. If we always release
> SSL_CTX objects when the SSL connection is closed, we will have
> undefined references for cached objects, leading to a segfault.
OK, I understood the opposite, which is that we kept a refcount for each
user (cache and/or sessions).
But then how do we know that an SSL_CTX is still in use when we want to
evict it from the cache and that we must not free it ? Is it just the
fact that between the moment it's picked from the cache using
ssl_sock_get_generated_cert() and the moment it's associated to a session
using SSL_set_SSL_CTX() it's not possible to yield and destroy the cached
object so no race is possible here ? If so I'm fine with it for now (though
it will become "fun" when we start to play with threads), I just want to
be certain we're not overlooking this part as well.
Also that raises another point : if the issue is related to SSL_CTX_free()
being called on static contexts, then to me it means that these contexts
were not properly refcounted when assigned to the SSL. Don't you think
that we shouldn't instead do something like the following to properly
refcount any context attached to an SSL and ensure that the SSL_CTX_free()
can always be performed regardless of parallel activities in the LRU tree
or anything else ?
/* Alloc a new SSL session ctx */
conn->xprt_ctx =
SSL_new(objt_server(conn->target)->ssl_ctx.ctx);
+ SSL_set_SSL_CTX(conn->xprt_ctx,
objt_server(conn->target)->ssl_ctx.ctx);
> The refcount is not incremented when a SSL_CTX object is pushed in the
> cache. There is no way to manually increment or decrement it. So, we
> must really know if the SSL_CTX object was cached or not when the SSL
> connection is closed.
I'm having an issue here as well since the LRU's destroy callback is set
to SSL_CTX_free. This we start with a non-null refcount. I'm sorry if I am
not clear, but the problem I'm having could be described like this :
- two sets of entities can use a shared resource at any instant : cache
and SSL sessions ;
- each of them uses SSL_CTX_free() at release time to release the object ;
- SSL_CTX_free() takes care of the refcount to know if it must free or not,
which means that these two entities above are each responsible for one
refcount point ;
- the SSL_CTX_free() called by the cache is unconditional when the object
is evicted from the cache ;
- the SSL_CTX_free() is only done if the cache is enabled ;
Due to last line I deduce that with this condition we're leaking SSL_CTX
when the cache is disabled. I'm possibly missing something, it's just that
seeing one entity monitor its adversary in a refcounted system triggers in
me the feeling that we're painting over the real problem.
I understand that what we're trying to achieve is only to avoid calling
SSL_CTX_free() on objects that were never dynamically allocated, which
is why I'm seriously thinking about refcounting these ones as well. I don't
agree with not calling it with objects that were dynamically allocated but
could never be added to the cache (ie cache disabled), for me it's a leak.
And in the end I suspect that we're still facing an imbalanced refcounting
system that instead of crashing will slowly leak some SSL_CTX.
> >>Well, I'm not an openssl guru. It is possible to store and retrieve data
> >>on a SSL_CTX object using SSL_CTX_set_ex_data/SSL_CTX_get_ex_data
> >>functions. But I don't know if this a good practice to use it. And I
> >>don't know if this is expensive or not.
> >
> >That's also what Rémi suggested. I don't know how it's used, I'm seeing
> >an index with it and that's already used for DH data, so I don't know how
> >it mixes (if at all) with this. I'm not much concerned by the access cost
> >in fact since we're supposed to access it once at session creation and once
> >during the release. It's just that I don't understand how this works. Maybe
> >the connection flag is simpler for now.
>
> Well, use SSL_CTX_set_ex_data/SSL_CTX_get_ex_data seems to work.
Yes I've seen, that's great, but I'd rather be sure we sort out the cause
of the problem before merging the fixes for what I think are its observable
consequences.
> But,
> I'm not a SSL expert, so maybe I missed something (and bugs related to
> my recent patches show that this is not false modesty...).
Don't worry for your modesty, I'm the worst ever analphabet when it comes
to openssl. I'm not proud of it but quite frankly SSL's complexity is the
major cause for its poor implementation and insecure deployment at many
places.
> I sent 2
> fixes for this bug [1][2]. If you want I rework one of them, I will be
> happy to do it.
>
> [1] https://www.mail-archive.com/[email protected]/msg19962.html
> [2] https://www.mail-archive.com/[email protected]/msg19995.html
Regarding the second one, maybe Rémi's review could help. I noticed that
you used gen_ssl_ctx_ptr_index = -1 which is the same value used for
dh_params. Based on its name it makes me think it's a position in an
array, so I'm not sure whether we can make them collide for example. I
really don't know this API at all.
Best regards,
Willy