On Wed, Feb 18, 2015 at 8:43 AM, Marco Graziano <[email protected]> wrote:
> On Tue, Feb 17, 2015 at 4:32 AM, Azat Khuzhin <[email protected]> wrote: > >> On Thu, Feb 12, 2015 at 09:16:20AM -0800, Marco Graziano wrote: >> > Hi Azat, >> >> Hi Marco, >> >> > >> > The backtraces I included are from the only threads waiting while >> holding a >> > lock. The other threads are providing convincing signs of good health in >> > the trace log. >> >> Could you please post backstraces for all available threads? >> (I hope you don't have 100 of them). > > Hi Azat, As an experiment, I removed all the additional locks and set only the flags BEV_OPT_CLOSE_ON_FREE | BEV_OPT_THREADSAFE I get very quickly into a deadlock with the following backtraces: (gdb) bt #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007ffff7770672 in _L_lock_953 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007ffff77704da in __GI___pthread_mutex_lock (mutex=0x6399f0) at ../nptl/pthread_mutex_lock.c:114 #3 0x00007ffff7b9686e in evbuffer_get_length (buffer=0x6398d0) at buffer.c:610 #4 0x0000000000404723 in device_write_cb (bev=0x64f350, arg=0x639ca0) at cometa.c:672 #5 0x00007ffff7ba2139 in bufferevent_trigger_nolock_ (options=0, iotype=4, bufev=0x64f350) at bufferevent-internal.h:369 #6 bufferevent_writecb (fd=<optimized out>, event=<optimized out>, arg=0x64f350) at bufferevent_sock.c:297 #7 0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized out>, base=0x6218d0) at event.c:1531 #8 event_process_active_single_queue (base=base@entry=0x6218d0, activeq=0x621d20, max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0) at event.c:1590 #9 0x00007ffff7ba82cf in event_process_active (base=0x6218d0) at event.c:1689 #10 event_base_loop (base=0x6218d0, flags=0) at event.c:1912 #11 0x000000000040a4b0 in main () at cometa.c:1579 (gdb) t 2 [Switching to thread 2 (Thread 0x7ffff5b30700 (LWP 28878))] #0 0x00007ffff68976a3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 81 ../sysdeps/unix/syscall-template.S: No such file or directory. (gdb) bt #0 0x00007ffff68976a3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ffff7bb19a8 in epoll_dispatch (base=0x7ffff0000900, tv=<optimized out>) at epoll.c:463 #2 0x00007ffff7ba8106 in event_base_loop (base=0x7ffff0000900, flags=0) at event.c:1898 #3 0x0000000000409faf in device_loop (arg=0x0) at cometa.c:1445 #4 0x00007ffff776e182 in start_thread (arg=0x7ffff5b30700) at pthread_create.c:312 #5 0x00007ffff689700d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) t 3 [Switching to thread 3 (Thread 0x7ffff532f700 (LWP 28879))] #0 0x00007ffff68976a3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 81 in ../sysdeps/unix/syscall-template.S (gdb) bt #0 0x00007ffff68976a3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ffff7bb19a8 in epoll_dispatch (base=0x7fffe8000900, tv=<optimized out>) at epoll.c:463 #2 0x00007ffff7ba8106 in event_base_loop (base=0x7fffe8000900, flags=0) at event.c:1898 #3 0x0000000000409faf in device_loop (arg=0x0) at cometa.c:1445 #4 0x00007ffff776e182 in start_thread (arg=0x7ffff532f700) at pthread_create.c:312 #5 0x00007ffff689700d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) t 4 [Switching to thread 4 (Thread 0x7ffff4b2e700 (LWP 28880))] #0 0x00007ffff68976a3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 81 in ../sysdeps/unix/syscall-template.S (gdb) bt #0 0x00007ffff68976a3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ffff7bb19a8 in epoll_dispatch (base=0x7fffc0000900, tv=<optimized out>) at epoll.c:463 #2 0x00007ffff7ba8106 in event_base_loop (base=0x7fffc0000900, flags=0) at event.c:1898 #3 0x0000000000409faf in device_loop (arg=0x0) at cometa.c:1445 #4 0x00007ffff776e182 in start_thread (arg=0x7ffff4b2e700) at pthread_create.c:312 #5 0x00007ffff689700d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) t 5 [Switching to thread 5 (Thread 0x7fffeffff700 (LWP 28881))] #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 135 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory. (gdb) bt #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007ffff7770672 in _L_lock_953 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00007ffff77704da in __GI___pthread_mutex_lock (mutex=0x64b320) at ../nptl/pthread_mutex_lock.c:114 #3 0x00007ffff7b9d445 in bufferevent_setcb (bufev=0x64f350, readcb=0x405022 <evhttp_read_cb>, writecb=0x4051ce <evhttp_write_cb>, eventcb=0x405129 <evhttp_event_cb>, cbarg=0x64e690) at bufferevent.c:372 #4 0x0000000000405ecb in send_http_response (connection=0x64e690, response_buffer=0x7fffbc0008c0) at cometa.c:977 #5 0x00000000004040ef in device_chunk_cb (device=0x639ca0) at cometa.c:471 #6 0x00000000004044ee in device_read_cb (bev=0x6396d0, arg=0x639ca0) at cometa.c:607 #7 0x00007ffff7ba2324 in bufferevent_trigger_nolock_ (options=0, iotype=2, bufev=0x6396d0) at bufferevent-internal.h:366 #8 bufferevent_readcb (fd=35, event=<optimized out>, arg=0x6396d0) at bufferevent_sock.c:187 #9 0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized out>, base=0x7fffbc000900) at event.c:1531 #10 event_process_active_single_queue (base=base@entry=0x7fffbc000900, activeq=0x7fffbc000d50, max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0) at event.c:1590 #11 0x00007ffff7ba82cf in event_process_active (base=0x7fffbc000900) at event.c:1689 #12 event_base_loop (base=0x7fffbc000900, flags=0) at event.c:1912 #13 0x0000000000409faf in device_loop (arg=0x0) at cometa.c:1445 #14 0x00007ffff776e182 in start_thread (arg=0x7fffeffff700) at pthread_create.c:312 #15 0x00007ffff689700d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Adding the BEV_OPT_DEFER_CALLBACKS seems to improve considerably and I haven't got a deadlock yet. At some point I would have to put some locks back in for synchronization though. I have occasional buffer corruption possibly from interleaved read callbacks from different threads. Thanks, -Marco G. > Hi Azat, > > The other 2 threads are idle: > > #0 0x00007ffff68976a3 in epoll_wait () at > ../sysdeps/unix/syscall-template.S:81 > #1 0x00007ffff7bb19a8 in epoll_dispatch (base=0x6218d0, tv=<optimized > out>) at epoll.c:463 > #2 0x00007ffff7ba8106 in event_base_loop (base=0x6218d0, flags=0) at > event.c:1898 > #3 0x000000000040a763 in main () at cometa.c:1583 > > >> > >> > Definitely the write callback was called tens of thousand of times >> before >> > incurring in the deadlock, and I suspect luck or chance has little to do >> > with the deadlock. It may have more to do with the fact that the read >> > callback receiving a message send it down a bufferevent with the write >> > callback handled by another thread? >> > > Any implications having the read callback receiving a message than is then > sent to a bufferevent and the write callback for that message being on a > different thread? > > >> > >> > The fact is that after adding BEV_OPT_UNLOCK_CALLBACKS the deadlock has >> not >> > appeared again. Before it took about one hour of stress test to appear. >> If >> > removing the callback lock solved the problem, please help me understand >> > what was happening in the first place and if for instance the added lock >> > inside the callback was unnecessary or even the cause of the deadlock >> > itself. > > > After additional testing, I confirm that the deadlock has not appeared > when setting BEV_OPT_UNLOCK_CALLBACKS. > > > >> > Also, since enabling OpenSSL is coming next, I was wondering if you >> have a >> > word of wisdom or two, on how to better tackle the task at hands or >> which >> > version of libevent is best suited for the job. I am using the head of >> the >> > tree, or to do some debugging if that it was it takes, I would be okay >> with >> > it. I am afraid that locking the callbacks may be needed with OpenSSL >> > enabled. >> >> I prefer to use the latest versions, to be up-to-date. >> bufferevent_openssl doesn't require bufferevent locking - look at >> bufferevent_openssl_new_impl() in bufferevent_openssl.c >> > > Hence using BEV_OPT_UNLOCK_CALLBACKS seems the way to go anyway. > > Thanks, > > -Marco G. > > > >> > Thanks for the useful insights. >> > >> > -Marco G. >> >> And for the next time: could you avoid top-posting please. >> > > ps. Sorry for that. Using GMail makes it too easy to forget. > > >> >> Cheers, >> Azat. >> >> > >> > On Wed, Feb 11, 2015 at 3:11 PM, Azat Khuzhin <[email protected]> >> wrote: >> > >> > > On Tue, Feb 10, 2015 at 09:06:40PM -0800, Marco Graziano wrote: >> > > > I am running into a deadlock with both libevent 2.0 and 2.1 that I >> cannot >> > > > explain. One thread is executing a write callback and the other >> thread a >> > > > read callback of a different bufferevent. Since they are both >> accessing >> > > the >> > > > same data structure passed to the callbacks (arg=0x630c40), I am >> using a >> > > > rwlock in the data structure. I did not expect though the evbuffer >> > > internal >> > > > lock to interfere with the lock used to synchronize access to the >> data >> > > > structure shared between the callbacks. >> > > > >> > > > I am including the backtraces and the relevant portions of code. I >> would >> > > > appreciate if someone can give me any suggestion on how to work >> around >> > > this. >> > > > >> > > > Thanks, >> > > > >> > > > -Marco G. >> > > > >> > > > (gdb) t 1 >> > > > [Switching to thread 1 (Thread 0x7ffff7fdd780 (LWP 12008))] >> > > > (gdb) bt >> > > > #0 __lll_lock_wait () at >> > > > ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 >> > > > #1 0x00007ffff7770672 in _L_lock_953 () from >> > > > /lib/x86_64-linux-gnu/libpthread.so.0 >> > > > #2 0x00007ffff77704da in __GI___pthread_mutex_lock >> (mutex=0x620200) at >> > > > ../nptl/pthread_mutex_lock.c:114 >> > > > #3 0x00007ffff7b9686e in evbuffer_get_length (buffer=0x630930) at >> > > > buffer.c:610 >> > > >> > > I don't sure that this two backtraces could have deadlock. Do you? >> > > Could it call write callback again and again, but you was so lucky >> that >> > > you always see such backtrace? >> > > >> > > Maybe some other thread acquire some locks in other order? >> > > (I bet to this one). >> > > >> > > Also did you check program under memcheck/drd/helgrind? >> > > Again if you was so *lucky* that you overwrite lock of evbuffer by >> your >> > > own, but it is so unlikely, that you could just ignore this. >> > > >> > > And if you could write 50-100L sample to reproduce the issue, it will >> be >> > > very helpful. >> > > >> > > Cheers, >> > > Azat. >> > > >> > > > #4 0x000000000040494b in device_write_cb (bev=0x123f640, >> arg=0x630c40) >> > > at >> > > > cometa.c:667 >> > > > #5 0x00007ffff7ba2139 in bufferevent_trigger_nolock_ (options=0, >> > > iotype=4, >> > > > bufev=0x123f640) at bufferevent-internal.h:369 >> > > > #6 bufferevent_writecb (fd=<optimized out>, event=<optimized out>, >> > > > arg=0x123f640) at bufferevent_sock.c:297 >> > > > #7 0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized out>, >> > > > base=0x6218d0) at event.c:1531 >> > > > #8 event_process_active_single_queue (base=base@entry=0x6218d0, >> > > > activeq=0x621d20, max_to_process=max_to_process@entry=2147483647, >> > > > endtime=endtime@entry=0x0) at event.c:1590 >> > > > #9 0x00007ffff7ba82cf in event_process_active (base=0x6218d0) at >> > > > event.c:1689 >> > > > #10 event_base_loop (base=0x6218d0, flags=0) at event.c:1912 >> > > > #11 0x000000000040a700 in main () at cometa.c:1571 >> > > > >> > > > >> > > > bufferevent 0x123f640 write callback: >> > > > 665 pthread_rwlock_wrlock(&device->lock); >> > > > 666 if ((device->pending && device->state == WAITING_STATE) || >> > > > device->ws_pending) { >> > > > 667 int l = evbuffer_get_length(device_bev); >> >> How device_dev is installed? >> (evbuffer_new()/evbuffer_enable_locking()) >> >> > > > 668 pthread_rwlock_unlock(&device->lock); >> > > > 669 output = bufferevent_get_output(device->pending->bufev); >> > > > 670 if (evbuffer_get_length(output) == 0) { >> > > > 671 /* free and close the connection */ >> > > > 672 pthread_rwlock_wrlock(&device->lock); >> > > > 673 free_http_connection(device->pending); >> > > > 674 device->pending = NULL; >> > > > 675 device->state = IDLE_STATE; >> > > > 676 device->chunk_length = 0; >> > > > 677 device->ntoread = -1; >> > > > 678 >> > > > 679 evbuffer_drain (device_bev, l); >> > > > 680 sem_post(&device->active); >> > > > 681 pthread_rwlock_unlock(&device->lock); >> > > > 682 } >> > > > 683 } else { >> > > > 684 pthread_rwlock_unlock(&device->lock); >> > > > 685 } >> > > > >> > > > >> > > > (gdb) t 3 >> > > > [Switching to thread 3 (Thread 0x7ffff532f700 (LWP 12013))] >> > > > (gdb) bt >> > > > #0 pthread_rwlock_wrlock () at >> > > > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:85 >> > > > #1 0x0000000000404667 in device_read_cb (bev=0x630730, >> arg=0x630c40) at >> > > > cometa.c:604 >> > > > #2 0x00007ffff7ba2324 in bufferevent_trigger_nolock_ (options=0, >> > > iotype=2, >> > > > bufev=0x630730) at bufferevent-internal.h:366 >> > > > #3 bufferevent_readcb (fd=34, event=<optimized out>, arg=0x630730) >> at >> > > > bufferevent_sock.c:187 >> > > > #4 0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized out>, >> > > > base=0x7fffe8000900) at event.c:1531 >> > > > #5 event_process_active_single_queue (base=base@entry >> =0x7fffe8000900, >> > > > activeq=0x7fffe8000d50, >> > > > max_to_process=max_to_process@entry=2147483647, >> > > endtime=endtime@entry=0x0) >> > > > at event.c:1590 >> > > > #6 0x00007ffff7ba82cf in event_process_active >> (base=0x7fffe8000900) at >> > > > event.c:1689 >> > > > #7 event_base_loop (base=0x7fffe8000900, flags=0) at event.c:1912 >> > > > #8 0x000000000040a1ff in device_loop (arg=0x0) at cometa.c:1437 >> > > > #9 0x00007ffff776e182 in start_thread (arg=0x7ffff532f700) at >> > > > pthread_create.c:312 >> > > > #10 0x00007ffff689700d in clone () at >> > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 >> > > > >> > > > bufferevent 0x630730 read call back: >> > > > 604 pthread_rwlock_wrlock(&device->lock); >> > > > 605 if (device->state != WAITING_STATE) { >> > > > 606 >> > > > 607 device->chunk_length = 0; >> > > > 608 device->ntoread = -1; >> > > > 609 /* in case of state == WAITING_STATE the counters reset is >> done >> > > in >> > > > the write callback after sending the response */ >> > > > 610 } >> > > > 611 pthread_rwlock_unlock(&device->lock); >> *********************************************************************** >> To unsubscribe, send an e-mail to [email protected] with >> unsubscribe libevent-users in the body. >> > >
