On Tue, Jun 28, 2022 at 6:08 PM Ivan Zhakov <i...@visualsvn.com> wrote: > > On Tue, 28 Jun 2022 at 00:24, Yann Ylavic <ylavic....@gmail.com> wrote: > > > > Hi Ivan, > > > > On Mon, Jun 27, 2022 at 8:19 PM Ivan Zhakov <i...@visualsvn.com> wrote: > > > > > > For the longer version, let me first outline a few problems with the > > > current apr_thread_current() API: > > > > > > 1) apr_thread_current_create() leaks the memory allocated in > > > alloc_thread() after the thread exits. > > > > Hm, alloc_thread() creates an unmanaged pool (thread->pool) and > > allocates everything for the thread on that pool (including the thread > > itself). > > This pool is destroyed when the thread is joined (for attached > > threads), or for detached threads when apr_thread_exit() is called > > explicitly otherwise when the thread function actually exits. > > So which memory leaks precisely? apr_thread_current_create() for the > > process' main thread may leak the thread pool on exit because it's > > detached, it has not been apr_thread_create()d and is usually not > > apr_thread_exit()ed, but when the main thread exits (hence the > > process) nothing really leaks.. Should it matter we could also have an > > explicit apr_thread_current_unleak(). > > > > Or do you mean the apr_threadattr_t that can be passed to > > apr_thread_current_create()? That one is on the caller's > > responsibility, and it can be reused for multiple threads if needed. > > > I meant a memory leak for native threads. For example, with an MPM > that uses native Windows threadpool for worker threads.
That's not mpm_winnt right? (it does not use native threads anymore and plays no apr_thread_current_create() game, it just uses apr_thread_create() so apr_thread_current() works automatically there). Probably the thread pool lets you pass the thread function? So you could "apr_thread_current_create(&mythread)" when entering the function and "apr_pool_destroy(apr_thread_pool_get(mythread))" before leaving? That'd still leak if the thread calls TerminateThread() explicitly at some point, but it seems that the native Windows thread API is not very helpful for that since there is way to register a destructor for when the thread terminates by any mean (like the pthread_key_create() API's destructor for instance). If the native API does not help I don't think that APR can do much better.. Also finally, if you want to leave native threads alone (i.e. not APR-ize them), just do that, ap_regex will still work. The ap_regex code is prepared to get apr_thread_current() == NULL, because it could be called by a non-APR thread, and in this case it works exactly how you propose it should (small stack with fall back to malloc/free for PCRE1, or with PCRE2 using its native exec-context alloc/free since it's opaque so you can't put it on the stack, but that'd be for every ap_regexec() call). Note that httpd could (hypothetically) later assert/assume, for some other code, that it's never called from a non-APR thread and enforce/omit the NULL checks, that's up to a "main" program to decide since it has the full picture (httpd probably won't do that in a 2.x version since it could break third-party MPMs creating native worker threads only, but who knows if for 3.x it proves to be a nice/simpler way to handle things in core/modules, then httpd-3.x could require that the MPMs create APR worker threads only, or APR-ize them before entering core/modules processing since there is an API for that..). > > > > > > 2) apr_thread_current_create() requires passing in the apr_threadattr_t > > > value that may be unknown to the caller, so the result may not correspond > > > to the actual thread attributes. > > > > This threadattr is for the users who know (attached vs detached > > mainly), but if they don't know/care maybe they shouldn't use > > apr_thread_current_create()? > > It really matters if > > apr_thread_detach()/apr_thread_join()/apr_thread_exit() is to be > > called later anyway, but users that do this usually create their > > thread with apr_thread_create(), and for these threads > > apr_thread_current_create() is moot anyway (apr_thread_current() works > > automagically for them). > > The main usage of apr_thread_current_create() is for the main thread, > > and this one is always detached per definition. But if users know the > > nature/lifetime of the native threads they want to APR-ize, > > apr_thread_current_create() can be useful too. > > > I think the problem is that in the general case the API user still > needs to pass in the proper apr_threadattr_t to receive the proper > current apr_thread_t object (even if there's a shortcut for some > cases). This is the case for apr_thread_create() too, you need to pass an apr_threadattr_t or use the default (NULL, i.e. a joinable thread). apr_thread_current_create() uses the same API because it's creating an apr_thread_t for the current thread, usable by the other apr_thread functions. If you don't know what that thread is, either don't use apr_thread_current_create() or pass it a NULL threadattr (you won't call apr_thread_join() or apr_thread_exit() on it anyway either) and find a native way to destroy its pool to avoid leaks. > > > > > > I think that we could try an alternative approach for that part of the > > > problem. The alternative approach would have the same characteristics as > > > the approach that had been used for PCRE1: > > > > > > -- Supply PCRE2 with a custom context with malloc and free > > > implementations. > > > -- Those implementations would work by either using a stack buffer for > > > small allocations or by doing a malloc(). > > > -- This allocation scheme would have the same properties as the existing > > > scheme used when compiling with PCRE1. > > > > For PCRE2, you would malloc()/free() for every ap_regexec(), which can > > be quite costly depending on the configuration/modules. This work was > > done precisely for this concern, for the switch from PCRE1 to PCRE2 to > > be as little costly as possible. The current implementation reuses the > > same PCRE2 context per thread for the lifetime of httpd.. > > Same for PCRE1, besides the stack buffer for small vectors (which is > > still there currently btw). > > > I was thinking about having a custom malloc()/free() implementation > that uses stack memory for first N bytes and then falls back to > ordinary malloc/free(). So for cases where the allocation fits into > the stack threshold there would be no malloc()/free() calls, as it was > with PCRE1 and POSIX_MALLOC_THRESHOLD. As said above, this is how it works already when apr_thread_current() is NULL. If it's not NULL, you get the nice behaviour that it reuses efficiently the same vector (PCRE1) or context (PCRE2) for each call on the same thread, with no allocation/free overhead or memory fragmentation. > > > > > > > With that in mind, the overall plan would be as follows: > > > > > > A) Implement the above approach for PCRE2 allocations in httpd. > > > B) Remove the apr_thread_current() and APR_THREAD_LOCAL from APR trunk > > > and 1.8.x. > > > C) Release APR 1.8.0. > > > > Frankly, I find it hard to shoot down an API that works because it > > could be misused when playing with native threads, whereas it works > > transparently when using the APR threads directly. > > > > Isn't somewhere: > > apr_thread_data_set(my_data, my_key, NULL, apr_thread_current()); > > And elsewhere: > > apr_thread_data_get(&my_data, my_key, apr_thread_current()); > > quite nice compared to creating a native threadkey (whose number is > > usually limited) for each entry? > > I don't find it too useful per-se: > 1. The code has to rely that apr_thread_current() doesn't return NULL > for some cases. I see no issue in code using APR relying on APR features that work for all the systems handled by APR. apr_thread_current() returning indicates a situation where the program does not use APR threads everywhere, if it happens the program can and has to deal with it. > 2. Changing data for other thread can lead to race conditions. What > about detached threads? What if a thread already exited? Nothing specific to APR here, that's related to threads in general. > 3. There is no guarantee that the chosen key doesn't conflict with > other module/application. There is no key/namespace with the APR_THREAD_LOCAL mechanism, hence no possible conflict. The internal thread local context (i.e. the apr_thread_t pointer for each thread) is a static _thread_local variable which can't be hijacked in any way. > > I think that the threadkey() API doesn't have these problems. And it > should be faster, because it does not require a hash lookup. That's not true, have a look at thread_local (or _thread_local or __thread or __declspec(thread) depending on the compiler) versus pthread_getspecific() or alike, the former has almost no performance impact while the latter clearly has (e.g. whenever the program uses dynamic linkage). Plus, the number of thread-keys is limited, and the more keys the more performance impact, far from a simple hashtable lookup with no call into libc's thread local storage handling. Regards; Yann.