On Fri, Feb 5, 2021 at 1:55 PM Ruediger Pluem <rpl...@apache.org> wrote: > > I am not against this patch, but how does it resolve the crash in [1]? Just > curious.
With apr <= 1.7.0, the thread pool is destroyed only by apr_thread_exit(), not on normal return from the thread function. This was fixed in apr-trunk by sf with [2], and I went beyond in 1.7.x with [3] by destroying the pool in apr_thread_join(), for lifetime reasons. So for all released versions of apr (1.7.0 or 1.6.5), which the ci uses most tests besides POOL_DEBUG/ASAN, any thread that does not call apr_thread_exit() explicitly will "leak" its pool, until the parent pool is destroyed. I thought that due to this the h2 worker thread's pool was destroyed too late (as a sub-sub-pool of pchild since that's what slot_run()->pool is), and that some cleanup there was accessing a resource already destroyed by workers_pool_cleanup(). But looking at the stack trace again I realized that the issue was elsewhere, there shouldn't be any slot_run() thread alive after all the pre_cleanups of pchild have completed, and the stack trace from [1] clearly shows one.. So I went with r1886255 [4] to close a potential race condition between workers_pool_cleanup() and slot_run(), hopefully this is the relevant fix. I'll let this leave a bit in trunk for review (the ci passed already), and will add to the r1883668 proposal later. Regards; Yann. [1] https://travis-ci.com/github/apache/httpd/jobs/472333512 [2] http://svn.apache.org/r1460182 [3] http://svn.apache.org/r1884103 [4] http://svn.apache.org/r1886255