On 2/2/2017 14:45, Branko Čibej wrote: > On 02.02.2017 13:30, Stefan Hett wrote: >> On 2/2/2017 1:24 PM, Branko Čibej wrote: >>> On 02.02.2017 12:49, Stefan Hett wrote: >>>> On 2/2/2017 12:04 PM, Nick Kew wrote: >>>>> On Wed, 2017-02-01 at 00:23 +0100, Stefan wrote: >>>>>> Hi, >>>>>> >>>>>> the issue was discovered as part of tracing down a deadlock >>>>>> condition in >>>>>> an SVN test [1]. >>>>>> >>>>>> As far as I understand it, the problem lies in the C-standard not >>>>>> explicitly defining when a function registered with atexit() is >>>>>> called >>>>>> in the context of thread termination [2]. >>>>> I had no idea atexit() was called on thread termination. I guess the >>>>> manpage must be misleading in telling us it happens at process exit? >>>> I expressed myself poorly a bit here. I didn't want to suggest >>>> atexit()-registered functions are called on thread termination but >>>> rather that threads can be terminated before the atexit-registered >>>> functions are called. >>>> So if apr_terminate is registered in a function called from a DLL via >>>> atexit(), it can happen (and in my scenario does happen) that threads >>>> got terminated before apr_terminate() is called. >>> But that should not matter. apr_thread_join() does not terminate a >>> thread, it waits for the thread to terminate. The thread handle should >>> remain valid until apr_thread_join() is called, even if the thread >>> terminated before the join(). >>> >> This is correct. The problem is that thd_cnt is not decremented >> (thd->td is valid, apr_thread_join() waits until the thread is (which >> it is already) and then returns APR_INCOMPLETE - nothing decrements >> the thd_cnt value for the already terminated thread). > > It turns out that this is a bug in APR's thread pool code, and has been > fixed on trunk. APR_INCOMPLETE is the _expected_ return code from > apr_thread_join() on Windows in this case, as far as I can interpret the > code. > As it turned out, the issue is actually still present on trunk. The full details behind the problem are summed up in a blog post now: http://www.luke1410.de/blog/?p=95
What do you (or the others) think. Does this warrant the patch being applied? I'd really not feel too good with leaving that issue unresolved in APR, given there's a fix available; but I'd certainly not commit it, if it's not getting support by the established community/committers. If someone things the patch needs to be improved (or a different approach to the problem should be taken), I'm certainly offering to come up with alternatives. Regards, Stefan