On Mon, Aug 6, 2012 at 9:42 PM, Matthieu Nottale < mnott...@aldebaran-robotics.com> wrote:
> Hi. > > I'm experiencing a deadlock on 2.0.19 while calling bufferevent_free frome > thread A, while thread B is in event_base_dispatch. > > Here are the two relevant backtraces: > > (gdb) bt > #0 0xb7fe1424 in __kernel_vsyscall () > #1 0xb7d1c48c in pthread_cond_wait@@GLIBC_2.3.2 () > at ../nptl/sysdeps/unix/sysv/**linux/i386/i686/../i486/** > pthread_cond_wait.S:169 > #2 0xb7f8f2dc in evthread_posix_cond_wait () > from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/** > sdk/lib/libqimessaging.so > #3 0xb7f776a0 in event_del_internal () > from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/** > sdk/lib/libqimessaging.so > #4 0xb7f7752d in event_del () from /home/bearclaw/aldebaran/qi-2/** > lib/qimessaging/build-linux32/**sdk/lib/libqimessaging.so > #5 0xb7f83b12 in be_socket_destruct () > from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/** > sdk/lib/libqimessaging.so > #6 0xb7f82172 in _bufferevent_decref_and_unlock () > from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/** > sdk/lib/libqimessaging.so > #7 0xb7f823c9 in bufferevent_free () > from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/** > sdk/lib/libqimessaging.so > > > > (gdb) thread 10 > [Switching to thread 10 (Thread 0xb6a6db70 (LWP 18334))]#0 0xb7fe1424 in > __kernel_vsyscall () > (gdb) bt > #0 0xb7fe1424 in __kernel_vsyscall () > #1 0xb7d1f0b9 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/** > linux/i386/i686/../i486/**lowlevellock.S:142 > #2 0xb7d1a559 in _L_lock_859 () from /lib/i386-linux-gnu/** > libpthread.so.0 > #3 0xb7d1a3eb in __pthread_mutex_lock (mutex=0xb6100780) at > pthread_mutex_lock.c:82 > #4 0xb7f8f0cb in evthread_posix_lock () > from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/** > sdk/lib/libqimessaging.so > #5 0xb7f82064 in _bufferevent_incref_and_lock () > from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/** > sdk/lib/libqimessaging.so > #6 0xb7f82f7f in bufferevent_writecb () > from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/** > sdk/lib/libqimessaging.so > #7 0xb7f74d91 in event_persist_closure () > from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/** > sdk/lib/libqimessaging.so > #8 0xb7f74ea7 in event_process_active_single_**queue () > from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/** > sdk/lib/libqimessaging.so > #9 0xb7f7510c in event_process_active () > from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/** > sdk/lib/libqimessaging.so > #10 0xb7f75764 in event_base_loop () > from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/** > sdk/lib/libqimessaging.so > #11 0xb7f751a2 in event_base_dispatch () > from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/** > sdk/lib/libqimessaging.so > > > > The way I understand it, > in event_del_internal() the pthread_cond_wait is done while holding the > bufev lock (acquired by _bufferevent_decref_and_unlock ) > in event_process_active_single_**queue, the callback is > bufferevent_writecb which tries to acquire the bufev lock. the mutex will > only be signaled when bufferevent_writecb returns, which is not going to > happen because the thread is locked->deadlock. > > The code can reach this point because event_process_active_single_**queue > temporarily releases th_base_lock before calling the callback function, > which leaves a window for the other thread to acquire it, test for > (base->current_event == ev ) and enter the pthread_cond_wait. > > Said differently: > 1) > bufferevent_free > BEV_LOCK(bufev) > _bufferevent_decref_and_unlock > be_socket_destruct > event_del > ACQUIRE(th_base_lock) > event_del_internal(event ev) > base = ev->ev_base; > if (base->current_event == ev ) > EVTHREAD_COND_WAIT(base->**current_event_cond, > base->th_base_lock); > > > > 2) > event_base_dispatch > event_base_loop > EVBASE_ACQUIRE_LOCK(base, th_base_lock); > event_process_active > event_process_active_single_**queue > base->current_event = ev; > EVBASE_RELEASE_LOCK(base, th_base_lock) > USER_CB > bufferevent_writecb > _bufferevent_incref_and_lock(**bufferevent bufev) > BEV_LOCK(bufev); > > Any idea how to fix this? I can't see a way out. > What if you free the bufferevent on the thread running event_base_dispatch? Create a new event with a callback that free that bufferevent. Make it active from the thread that call bufferevent_free today.