Re: Issues with shipping apr_thread_current() in APR 1.8.0 and an alternative approach for PCRE2 allocations in HTTPD

Yann Ylavic Tue, 28 Jun 2022 15:29:02 -0700

On Tue, Jun 28, 2022 at 6:08 PM Ivan Zhakov <i...@visualsvn.com> wrote:
>
> On Tue, 28 Jun 2022 at 00:24, Yann Ylavic <ylavic....@gmail.com> wrote:
> >
> > Hi Ivan,
> >
> > On Mon, Jun 27, 2022 at 8:19 PM Ivan Zhakov <i...@visualsvn.com> wrote:
> > >
> > > For the longer version, let me first outline a few problems with the 
> > > current apr_thread_current() API:
> > >
> > > 1) apr_thread_current_create() leaks the memory allocated in 
> > > alloc_thread() after the thread exits.
> >
> > Hm, alloc_thread() creates an unmanaged pool (thread->pool) and
> > allocates everything for the thread on that pool (including the thread
> > itself).
> > This pool is destroyed when the thread is joined (for attached
> > threads), or for detached threads when apr_thread_exit() is called
> > explicitly otherwise when the thread function actually exits.
> > So which memory leaks precisely? apr_thread_current_create() for the
> > process' main thread may leak the thread pool on exit because it's
> > detached, it has not been apr_thread_create()d and is usually not
> > apr_thread_exit()ed, but when the main thread exits (hence the
> > process) nothing really leaks.. Should it matter we could also have an
> > explicit apr_thread_current_unleak().
> >
> > Or do you mean the apr_threadattr_t that can be passed to
> > apr_thread_current_create()? That one is on the caller's
> > responsibility, and it can be reused for multiple threads if needed.
> >
> I meant a memory leak for native threads. For example, with an MPM
> that uses native Windows threadpool for worker threads.


That's not mpm_winnt right? (it does not use native threads anymore
and plays no apr_thread_current_create() game, it just uses
apr_thread_create() so apr_thread_current() works automatically
there).

Probably the thread pool lets you pass the thread function? So you
could "apr_thread_current_create(&mythread)" when entering the
function and "apr_pool_destroy(apr_thread_pool_get(mythread))" before
leaving?
That'd still leak if the thread calls TerminateThread() explicitly at
some point, but it seems that the native Windows thread API is not
very helpful for that since there is way to register a destructor for
when the thread terminates by any mean (like the pthread_key_create()
API's destructor for instance). If the native API does not help I
don't think that APR can do much better..

Also finally, if you want to leave native threads alone (i.e. not
APR-ize them), just do that, ap_regex will still work. The ap_regex
code is prepared to get apr_thread_current() == NULL, because it could
be called by a non-APR thread, and in this case it works exactly how
you propose it should (small stack with fall back to malloc/free for
PCRE1, or with PCRE2 using its native exec-context alloc/free since
it's opaque so you can't put it on the stack, but that'd be for every
ap_regexec() call).

Note that httpd could (hypothetically) later assert/assume, for some
other code, that it's never called from a non-APR thread and
enforce/omit the NULL checks, that's up to a "main" program to decide
since it has the full picture (httpd probably won't do that in a 2.x
version since it could break third-party MPMs creating native worker
threads only, but who knows if for 3.x it proves to be a nice/simpler
way to handle things in core/modules, then httpd-3.x could require
that the MPMs create APR worker threads only, or APR-ize them before
entering core/modules processing since there is an API for that..).

> > >
> > > 2) apr_thread_current_create() requires passing in the apr_threadattr_t 
> > > value that may be unknown to the caller, so the result may not correspond 
> > > to the actual thread attributes.
> >
> > This threadattr is for the users who know (attached vs detached
> > mainly), but if they don't know/care maybe they shouldn't use
> > apr_thread_current_create()?
> > It really matters if
> > apr_thread_detach()/apr_thread_join()/apr_thread_exit() is to be
> > called later anyway, but users that do this usually create their
> > thread with apr_thread_create(), and for these threads
> > apr_thread_current_create() is moot anyway (apr_thread_current() works
> > automagically for them).
> > The main usage of apr_thread_current_create() is for the main thread,
> > and this one is always detached per definition. But if users know the
> > nature/lifetime of the native threads they want to APR-ize,
> > apr_thread_current_create() can be useful too.
> >
> I think the problem is that in the general case the API user still
> needs to pass in the proper apr_threadattr_t to receive the proper
> current apr_thread_t object (even if there's a shortcut for some
> cases).

This is the case for apr_thread_create() too, you need to pass an
apr_threadattr_t or use the default (NULL, i.e. a joinable thread).
apr_thread_current_create() uses the same API because it's creating an
apr_thread_t for the current thread, usable by the other apr_thread
functions. If you don't know what that thread is, either don't use
apr_thread_current_create() or pass it a NULL threadattr (you won't
call apr_thread_join() or apr_thread_exit() on it anyway either) and
find a native way to destroy its pool to avoid leaks.

> > >
> > > I think that we could try an alternative approach for that part of the 
> > > problem. The alternative approach would have the same characteristics as 
> > > the approach that had been used for PCRE1:
> > >
> > > -- Supply PCRE2 with a custom context with malloc and free 
> > > implementations.
> > > -- Those implementations would work by either using a stack buffer for 
> > > small allocations or by doing a malloc().
> > > -- This allocation scheme would have the same properties as the existing 
> > > scheme used when compiling with PCRE1.
> >
> > For PCRE2, you would malloc()/free() for every ap_regexec(), which can
> > be quite costly depending on the configuration/modules. This work was
> > done precisely for this concern, for the switch from PCRE1 to PCRE2 to
> > be as little costly as possible. The current implementation reuses the
> > same PCRE2 context per thread for the lifetime of httpd..
> > Same for PCRE1, besides the stack buffer for small vectors (which is
> > still there currently btw).
> >
> I was thinking about having a custom malloc()/free() implementation
> that uses stack memory for first N bytes and then falls back to
> ordinary malloc/free(). So for cases where the allocation fits into
> the stack threshold there would be no malloc()/free() calls, as it was
> with PCRE1 and POSIX_MALLOC_THRESHOLD.

As said above, this is how it works already when apr_thread_current() is NULL.
If it's not NULL, you get the nice behaviour that it reuses
efficiently the same vector (PCRE1) or context (PCRE2) for each call
on the same thread, with no allocation/free overhead or memory
fragmentation.

>
> > >
> > > With that in mind, the overall plan would be as follows:
> > >
> > > A) Implement the above approach for PCRE2 allocations in httpd.
> > > B) Remove the apr_thread_current() and APR_THREAD_LOCAL from APR trunk 
> > > and 1.8.x.
> > > C) Release APR 1.8.0.
> >
> > Frankly, I find it hard to shoot down an API that works because it
> > could be misused when playing with native threads, whereas it works
> > transparently when using the APR threads directly.
> >
> > Isn't somewhere:
> >  apr_thread_data_set(my_data, my_key, NULL, apr_thread_current());
> > And elsewhere:
> >  apr_thread_data_get(&my_data, my_key, apr_thread_current());
> > quite nice compared to creating a native threadkey (whose number is
> > usually limited) for each entry?
>
> I don't find it too useful per-se:
> 1. The code has to rely that apr_thread_current() doesn't return NULL
> for some cases.

I see no issue in code using APR relying on APR features that work for
all the systems handled by APR.
apr_thread_current() returning indicates a situation where the program
does not use APR threads everywhere, if it happens the program can and
has to deal with it.

> 2. Changing data for other thread can lead to race conditions. What
> about detached threads? What if a thread already exited?

Nothing specific to APR here, that's related to threads in general.

> 3. There is no guarantee that the chosen key doesn't conflict with
> other module/application.

There is no key/namespace with the APR_THREAD_LOCAL mechanism, hence
no possible conflict.
The internal thread local context (i.e. the apr_thread_t pointer for
each thread) is a static _thread_local variable which can't be
hijacked in any way.

>
> I think that the threadkey() API doesn't have these problems. And it
> should be faster, because it does not require a hash lookup.

That's not true, have a look at thread_local (or _thread_local or
__thread or __declspec(thread) depending on the compiler) versus
pthread_getspecific() or alike, the former has almost no performance
impact while the latter clearly has (e.g. whenever the program uses
dynamic linkage).
Plus, the number of thread-keys is limited, and the more keys the more
performance impact, far from a simple hashtable lookup with no call
into libc's thread local storage handling.


Regards;
Yann.

Re: Issues with shipping apr_thread_current() in APR 1.8.0 and an alternative approach for PCRE2 allocations in HTTPD

Reply via email to