On Tue, Feb 17, 2015 at 4:32 AM, Azat Khuzhin <[email protected]> wrote:
> On Thu, Feb 12, 2015 at 09:16:20AM -0800, Marco Graziano wrote: > > Hi Azat, > > Hi Marco, > > > > > The backtraces I included are from the only threads waiting while > holding a > > lock. The other threads are providing convincing signs of good health in > > the trace log. > > Could you please post backstraces for all available threads? > (I hope you don't have 100 of them). Hi Azat, The other 2 threads are idle: #0 0x00007ffff68976a3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ffff7bb19a8 in epoll_dispatch (base=0x6218d0, tv=<optimized out>) at epoll.c:463 #2 0x00007ffff7ba8106 in event_base_loop (base=0x6218d0, flags=0) at event.c:1898 #3 0x000000000040a763 in main () at cometa.c:1583 > > > > Definitely the write callback was called tens of thousand of times before > > incurring in the deadlock, and I suspect luck or chance has little to do > > with the deadlock. It may have more to do with the fact that the read > > callback receiving a message send it down a bufferevent with the write > > callback handled by another thread? > Any implications having the read callback receiving a message than is then sent to a bufferevent and the write callback for that message being on a different thread? > > > > The fact is that after adding BEV_OPT_UNLOCK_CALLBACKS the deadlock has > not > > appeared again. Before it took about one hour of stress test to appear. > If > > removing the callback lock solved the problem, please help me understand > > what was happening in the first place and if for instance the added lock > > inside the callback was unnecessary or even the cause of the deadlock > > itself. After additional testing, I confirm that the deadlock has not appeared when setting BEV_OPT_UNLOCK_CALLBACKS. > > > Also, since enabling OpenSSL is coming next, I was wondering if you have > a > > word of wisdom or two, on how to better tackle the task at hands or which > > version of libevent is best suited for the job. I am using the head of > the > > tree, or to do some debugging if that it was it takes, I would be okay > with > > it. I am afraid that locking the callbacks may be needed with OpenSSL > > enabled. > > I prefer to use the latest versions, to be up-to-date. > bufferevent_openssl doesn't require bufferevent locking - look at > bufferevent_openssl_new_impl() in bufferevent_openssl.c > Hence using BEV_OPT_UNLOCK_CALLBACKS seems the way to go anyway. Thanks, -Marco G. > > > Thanks for the useful insights. > > > > -Marco G. > > And for the next time: could you avoid top-posting please. > ps. Sorry for that. Using GMail makes it too easy to forget. > > Cheers, > Azat. > > > > > On Wed, Feb 11, 2015 at 3:11 PM, Azat Khuzhin <[email protected]> > wrote: > > > > > On Tue, Feb 10, 2015 at 09:06:40PM -0800, Marco Graziano wrote: > > > > I am running into a deadlock with both libevent 2.0 and 2.1 that I > cannot > > > > explain. One thread is executing a write callback and the other > thread a > > > > read callback of a different bufferevent. Since they are both > accessing > > > the > > > > same data structure passed to the callbacks (arg=0x630c40), I am > using a > > > > rwlock in the data structure. I did not expect though the evbuffer > > > internal > > > > lock to interfere with the lock used to synchronize access to the > data > > > > structure shared between the callbacks. > > > > > > > > I am including the backtraces and the relevant portions of code. I > would > > > > appreciate if someone can give me any suggestion on how to work > around > > > this. > > > > > > > > Thanks, > > > > > > > > -Marco G. > > > > > > > > (gdb) t 1 > > > > [Switching to thread 1 (Thread 0x7ffff7fdd780 (LWP 12008))] > > > > (gdb) bt > > > > #0 __lll_lock_wait () at > > > > ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 > > > > #1 0x00007ffff7770672 in _L_lock_953 () from > > > > /lib/x86_64-linux-gnu/libpthread.so.0 > > > > #2 0x00007ffff77704da in __GI___pthread_mutex_lock (mutex=0x620200) > at > > > > ../nptl/pthread_mutex_lock.c:114 > > > > #3 0x00007ffff7b9686e in evbuffer_get_length (buffer=0x630930) at > > > > buffer.c:610 > > > > > > I don't sure that this two backtraces could have deadlock. Do you? > > > Could it call write callback again and again, but you was so lucky that > > > you always see such backtrace? > > > > > > Maybe some other thread acquire some locks in other order? > > > (I bet to this one). > > > > > > Also did you check program under memcheck/drd/helgrind? > > > Again if you was so *lucky* that you overwrite lock of evbuffer by your > > > own, but it is so unlikely, that you could just ignore this. > > > > > > And if you could write 50-100L sample to reproduce the issue, it will > be > > > very helpful. > > > > > > Cheers, > > > Azat. > > > > > > > #4 0x000000000040494b in device_write_cb (bev=0x123f640, > arg=0x630c40) > > > at > > > > cometa.c:667 > > > > #5 0x00007ffff7ba2139 in bufferevent_trigger_nolock_ (options=0, > > > iotype=4, > > > > bufev=0x123f640) at bufferevent-internal.h:369 > > > > #6 bufferevent_writecb (fd=<optimized out>, event=<optimized out>, > > > > arg=0x123f640) at bufferevent_sock.c:297 > > > > #7 0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized out>, > > > > base=0x6218d0) at event.c:1531 > > > > #8 event_process_active_single_queue (base=base@entry=0x6218d0, > > > > activeq=0x621d20, max_to_process=max_to_process@entry=2147483647, > > > > endtime=endtime@entry=0x0) at event.c:1590 > > > > #9 0x00007ffff7ba82cf in event_process_active (base=0x6218d0) at > > > > event.c:1689 > > > > #10 event_base_loop (base=0x6218d0, flags=0) at event.c:1912 > > > > #11 0x000000000040a700 in main () at cometa.c:1571 > > > > > > > > > > > > bufferevent 0x123f640 write callback: > > > > 665 pthread_rwlock_wrlock(&device->lock); > > > > 666 if ((device->pending && device->state == WAITING_STATE) || > > > > device->ws_pending) { > > > > 667 int l = evbuffer_get_length(device_bev); > > How device_dev is installed? > (evbuffer_new()/evbuffer_enable_locking()) > > > > > 668 pthread_rwlock_unlock(&device->lock); > > > > 669 output = bufferevent_get_output(device->pending->bufev); > > > > 670 if (evbuffer_get_length(output) == 0) { > > > > 671 /* free and close the connection */ > > > > 672 pthread_rwlock_wrlock(&device->lock); > > > > 673 free_http_connection(device->pending); > > > > 674 device->pending = NULL; > > > > 675 device->state = IDLE_STATE; > > > > 676 device->chunk_length = 0; > > > > 677 device->ntoread = -1; > > > > 678 > > > > 679 evbuffer_drain (device_bev, l); > > > > 680 sem_post(&device->active); > > > > 681 pthread_rwlock_unlock(&device->lock); > > > > 682 } > > > > 683 } else { > > > > 684 pthread_rwlock_unlock(&device->lock); > > > > 685 } > > > > > > > > > > > > (gdb) t 3 > > > > [Switching to thread 3 (Thread 0x7ffff532f700 (LWP 12013))] > > > > (gdb) bt > > > > #0 pthread_rwlock_wrlock () at > > > > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:85 > > > > #1 0x0000000000404667 in device_read_cb (bev=0x630730, > arg=0x630c40) at > > > > cometa.c:604 > > > > #2 0x00007ffff7ba2324 in bufferevent_trigger_nolock_ (options=0, > > > iotype=2, > > > > bufev=0x630730) at bufferevent-internal.h:366 > > > > #3 bufferevent_readcb (fd=34, event=<optimized out>, arg=0x630730) > at > > > > bufferevent_sock.c:187 > > > > #4 0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized out>, > > > > base=0x7fffe8000900) at event.c:1531 > > > > #5 event_process_active_single_queue (base=base@entry > =0x7fffe8000900, > > > > activeq=0x7fffe8000d50, > > > > max_to_process=max_to_process@entry=2147483647, > > > endtime=endtime@entry=0x0) > > > > at event.c:1590 > > > > #6 0x00007ffff7ba82cf in event_process_active (base=0x7fffe8000900) > at > > > > event.c:1689 > > > > #7 event_base_loop (base=0x7fffe8000900, flags=0) at event.c:1912 > > > > #8 0x000000000040a1ff in device_loop (arg=0x0) at cometa.c:1437 > > > > #9 0x00007ffff776e182 in start_thread (arg=0x7ffff532f700) at > > > > pthread_create.c:312 > > > > #10 0x00007ffff689700d in clone () at > > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > > > > > > > bufferevent 0x630730 read call back: > > > > 604 pthread_rwlock_wrlock(&device->lock); > > > > 605 if (device->state != WAITING_STATE) { > > > > 606 > > > > 607 device->chunk_length = 0; > > > > 608 device->ntoread = -1; > > > > 609 /* in case of state == WAITING_STATE the counters reset is > done > > > in > > > > the write callback after sending the response */ > > > > 610 } > > > > 611 pthread_rwlock_unlock(&device->lock); > *********************************************************************** > To unsubscribe, send an e-mail to [email protected] with > unsubscribe libevent-users in the body. >
