On 9/2/20 4:51 PM, Joe Orton wrote:
> On Wed, Sep 02, 2020 at 12:36:26PM +0200, Ruediger Pluem wrote:
>> will always fail since node->endp[index] should be zero. Hence it should not 
>> find a pool. Or am I off now?
> 
> I think we get very little ordering constraints without memory 
> synchronisation, regardless of the order of the C statements another 
> thread can see a different/inconsistent memory state, but not really 
> sure.

You mean due to CPU caches and because some of the values might be stored in 
registers? Would volatile for these vars help?

> 
> I tried reproducing locally and can trigger crashes in these code paths 
> quite easily with running ab against pool-debug/event build with 
> memcache or shmcb configured as session cache.
> 
> In this case one thread is walking the pool heirarchy and trying to lock 
> a pool mutex which has already been deleted, which triggers a libc 
> assert() instead.  The mutex is in ptrans, I assume from another event 
> thread.

It looks like there is some mutex protection in apr_pool_clear_debug, 
apr_pool_destroy_debug and
comments that point to apr_pool_walk_tree (e.g.
https://svn.apache.org/viewvc/apr/apr/trunk/memory/unix/apr_pools.c?annotate=1878340#l1916).
 Maybe the issue is that
pool_destroy_debug doesn't do a locking before calling pool_clear_debug.
Can you provide a gdb print for the pool parameters passed in frame 15 till 8 
to apr_pool_walk_tree?
I am not sure how pool became zero in the mutex struct you dumped as I see no 
such action in the mutex cleanup.

> 
> If we see this regularly I'd conclude the apr_pool_find() stuff is 
> simply unsafe and should be turned off.

As the crashes seem to be caused by an APR problem shall we continue this 
discussion on dev@apr or should we work this out in more
detail here?

> 
> (gdb) where
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007f842445f895 in __GI_abort () at abort.c:79
> #2  0x00007f842445f769 in __assert_fail_base (fmt=0x7f84245cd0b8 "%s%s%s:%u: 
> %s%sAssertion `%s' failed.\n%n", 
>     assertion=0x7f8424652110 "new_prio == -1 || (new_prio >= fifo_min_prio && 
> new_prio <= fifo_max_prio)", 
>     file=0x7f8424652104 "tpp.c", line=82, function=<optimized out>) at 
> assert.c:92
> #3  0x00007f842446ee76 in __GI___assert_fail (
>     assertion=assertion@entry=0x7f8424652110 "new_prio == -1 || (new_prio >= 
> fifo_min_prio && new_prio <= fifo_max_prio)", 
>     file=file@entry=0x7f8424652104 "tpp.c", line=line@entry=82, 
>     function=function@entry=0x7f84246521c0 <__PRETTY_FUNCTION__.0> 
> "__pthread_tpp_change_priority") at assert.c:101
> #4  0x00007f842464f849 in __pthread_tpp_change_priority 
> (previous_prio=previous_prio@entry=-1, new_prio=new_prio@entry=2088)
>     at tpp.c:82
> #5  0x00007f84246458a0 in __pthread_mutex_lock_full (mutex=0x7f83f80c8b18) at 
> ../nptl/pthread_mutex_lock.c:545
> #6  0x00007f8424645e05 in __GI___pthread_mutex_lock 
> (mutex=mutex@entry=0x7f83f80c8b18) at ../nptl/pthread_mutex_lock.c:73
> #7  0x00007f84246f3e0a in apr_thread_mutex_lock (mutex=0x7f83f80c8b10) at 
> locks/unix/thread_mutex.c:127
> #8  0x00007f84246f437e in apr_pool_walk_tree (data=0x7f83b9ffabc8, 
> fn=0x7f84246f4190 <pool_find>, pool=0x7f839000d680)
>     at memory/unix/apr_pools.c:1505
> #9  apr_pool_walk_tree (pool=0x7f839000d680, fn=0x7f84246f4190 <pool_find>, 
> data=0x7f83b9ffabc8) at memory/unix/apr_pools.c:1492
> #10 0x00007f84246f439e in apr_pool_walk_tree (data=0x7f83b9ffabc8, 
> fn=0x7f84246f4190 <pool_find>, pool=0x1482e30)
>     at memory/unix/apr_pools.c:1511
> #11 apr_pool_walk_tree (pool=0x1482e30, fn=0x7f84246f4190 <pool_find>, 
> data=0x7f83b9ffabc8) at memory/unix/apr_pools.c:1492
> #12 0x00007f84246f439e in apr_pool_walk_tree (data=0x7f83b9ffabc8, 
> fn=0x7f84246f4190 <pool_find>, pool=0x14828b0)
>     at memory/unix/apr_pools.c:1511
> #13 apr_pool_walk_tree (pool=0x14828b0, fn=0x7f84246f4190 <pool_find>, 
> data=0x7f83b9ffabc8) at memory/unix/apr_pools.c:1492
> #14 0x00007f84246f54d8 in apr_pool_walk_tree (data=0x7f83b9ffabc8, 
> fn=0x7f84246f4190 <pool_find>, pool=0x14822d0)
>     at memory/unix/apr_pools.c:1511
> #15 apr_pool_walk_tree (data=0x7f83b9ffabc8, fn=0x7f84246f4190 <pool_find>, 
> pool=0x14822d0) at memory/unix/apr_pools.c:1492
> #16 apr_pool_find (mem=mem@entry=0x7f83a4023f06) at 
> memory/unix/apr_pools.c:2291
> #17 0x00007f84246e72cd in apr_table_addn (t=0x7f83a4023ab0, 
> key=key@entry=0x7f83a4023f00 "Host", 
>     val=val@entry=0x7f83a4023f06 "localhost:8532") at tables/apr_tables.c:823
> #18 0x0000000000455bb4 in ap_get_mime_headers_core (r=r@entry=0x7f83a4002370, 
> bb=bb@entry=0x7f83a40230a0) at protocol.c:1353
> #19 0x0000000000458a3f in ap_read_request (conn=conn@entry=0x7f83a401b340) at 
> protocol.c:1513
> #20 0x0000000000434f6f in ap_process_http_async_connection (c=0x7f83a401b340) 
> at http_core.c:148
> #21 ap_process_http_connection (c=0x7f83a401b340) at http_core.c:252
> #22 0x000000000044cb00 in ap_run_process_connection 
> (c=c@entry=0x7f83a401b340) at connection.c:42
> #23 0x00007f842488c8f1 in process_socket (thd=thd@entry=0x7f8408004920, 
> p=<optimized out>, sock=0x7f8390001480, 
>     cs=0x7f83a4015ef0, my_child_num=my_child_num@entry=3, 
> my_thread_num=my_thread_num@entry=23) at event.c:1106
> #24 0x00007f842488cfd8 in worker_thread (thd=0x7f8408004920, dummy=<optimized 
> out>) at event.c:2396
> #25 0x00007f84246ff57e in dummy_worker (opaque=0x7f8408004920) at 
> threadproc/unix/thread.c:145
> #26 0x00007f8424643432 in start_thread (arg=<optimized out>) at 
> pthread_create.c:477
> #27 0x00007f842453b913 in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> (gdb) up 7
> #7  0x00007f84246f3e0a in apr_thread_mutex_lock (mutex=0x7f83f80c8b10) at 
> locks/unix/thread_mutex.c:127
> 127           rv = pthread_mutex_lock(&mutex->mutex);
> (gdb) print *mutex
> $1 = {pool = 0x0, mutex = {__data = {__lock = 1094795585, __count = 
> 1094795585, __owner = 1094795585, __nusers = 1094795585, 
>       __kind = 1094795585, __spins = 16705, __elision = 16705, __list = 
> {__prev = 0x4141414141414141, 
>         __next = 0x4141414141414141}}, __size = 'A' <repeats 40 times>, 
> __align = 4702111234474983745}, 
>   cond = 0x4141414141414141, locked = 1094795585, num_waiters = 1094795585}
> 
> 

Regards

RĂ¼diger

Reply via email to