On Fri, Feb 5, 2021 at 1:55 PM Ruediger Pluem <rpl...@apache.org> wrote:
>
> I am not against this patch, but how does it resolve the crash in [1]? Just 
> curious.

With apr <= 1.7.0, the thread pool is destroyed only by
apr_thread_exit(), not on normal return from the thread function.
This was fixed in apr-trunk by sf with [2], and I went beyond in 1.7.x
with [3] by destroying the pool in apr_thread_join(), for lifetime
reasons.
So for all released versions of apr (1.7.0 or 1.6.5), which the ci
uses most tests besides POOL_DEBUG/ASAN, any thread that does not call
apr_thread_exit() explicitly will "leak" its pool, until the parent
pool is destroyed.

I thought that due to this the h2 worker thread's pool was destroyed
too late (as a sub-sub-pool of pchild since that's what
slot_run()->pool is), and that some cleanup there was accessing a
resource already destroyed by workers_pool_cleanup(). But looking at
the stack trace again I realized that the issue was elsewhere, there
shouldn't be any slot_run() thread alive after all the pre_cleanups of
pchild have completed, and the stack trace from [1] clearly shows
one..

So I went with r1886255 [4] to close a potential race condition
between workers_pool_cleanup() and slot_run(), hopefully this is the
relevant fix. I'll let this leave a bit in trunk for review (the ci
passed already), and will add to the r1883668 proposal later.

Regards;
Yann.

[1] https://travis-ci.com/github/apache/httpd/jobs/472333512
[2] http://svn.apache.org/r1460182
[3] http://svn.apache.org/r1884103
[4] http://svn.apache.org/r1886255

Reply via email to