On Mon, Aug 6, 2012 at 9:42 PM, Matthieu Nottale <
mnott...@aldebaran-robotics.com> wrote:

> Hi.
>
> I'm experiencing a deadlock on 2.0.19 while calling bufferevent_free frome
> thread A, while thread B is in event_base_dispatch.
>
> Here are the two relevant backtraces:
>
> (gdb) bt
> #0  0xb7fe1424 in __kernel_vsyscall ()
> #1  0xb7d1c48c in pthread_cond_wait@@GLIBC_2.3.2 ()
>     at ../nptl/sysdeps/unix/sysv/**linux/i386/i686/../i486/**
> pthread_cond_wait.S:169
> #2  0xb7f8f2dc in evthread_posix_cond_wait ()
>    from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/**
> sdk/lib/libqimessaging.so
> #3  0xb7f776a0 in event_del_internal ()
>    from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/**
> sdk/lib/libqimessaging.so
> #4  0xb7f7752d in event_del () from /home/bearclaw/aldebaran/qi-2/**
> lib/qimessaging/build-linux32/**sdk/lib/libqimessaging.so
> #5  0xb7f83b12 in be_socket_destruct ()
>    from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/**
> sdk/lib/libqimessaging.so
> #6  0xb7f82172 in _bufferevent_decref_and_unlock ()
>    from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/**
> sdk/lib/libqimessaging.so
> #7  0xb7f823c9 in bufferevent_free ()
>    from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/**
> sdk/lib/libqimessaging.so
>
>
>
> (gdb) thread 10
> [Switching to thread 10 (Thread 0xb6a6db70 (LWP 18334))]#0  0xb7fe1424 in
> __kernel_vsyscall ()
> (gdb) bt
> #0  0xb7fe1424 in __kernel_vsyscall ()
> #1  0xb7d1f0b9 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/**
> linux/i386/i686/../i486/**lowlevellock.S:142
> #2  0xb7d1a559 in _L_lock_859 () from /lib/i386-linux-gnu/**
> libpthread.so.0
> #3  0xb7d1a3eb in __pthread_mutex_lock (mutex=0xb6100780) at
> pthread_mutex_lock.c:82
> #4  0xb7f8f0cb in evthread_posix_lock ()
>    from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/**
> sdk/lib/libqimessaging.so
> #5  0xb7f82064 in _bufferevent_incref_and_lock ()
>    from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/**
> sdk/lib/libqimessaging.so
> #6  0xb7f82f7f in bufferevent_writecb ()
>    from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/**
> sdk/lib/libqimessaging.so
> #7  0xb7f74d91 in event_persist_closure ()
>    from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/**
> sdk/lib/libqimessaging.so
> #8  0xb7f74ea7 in event_process_active_single_**queue ()
>    from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/**
> sdk/lib/libqimessaging.so
> #9  0xb7f7510c in event_process_active ()
>    from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/**
> sdk/lib/libqimessaging.so
> #10 0xb7f75764 in event_base_loop ()
>    from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/**
> sdk/lib/libqimessaging.so
> #11 0xb7f751a2 in event_base_dispatch ()
>    from /home/bearclaw/aldebaran/qi-2/**lib/qimessaging/build-linux32/**
> sdk/lib/libqimessaging.so
>
>
>
> The way I understand it,
> in event_del_internal() the pthread_cond_wait is done while holding the
> bufev lock (acquired by _bufferevent_decref_and_unlock )
> in event_process_active_single_**queue, the callback is
> bufferevent_writecb which tries to acquire the bufev lock. the mutex will
> only be signaled when bufferevent_writecb returns, which is not going to
> happen because the thread is locked->deadlock.
>
> The code can reach this point because event_process_active_single_**queue
> temporarily releases th_base_lock before calling the callback function,
> which leaves a window for the other thread to acquire it, test for
> (base->current_event == ev ) and enter the pthread_cond_wait.
>
> Said differently:
> 1)
>   bufferevent_free
>       BEV_LOCK(bufev)
>      _bufferevent_decref_and_unlock
>          be_socket_destruct
>            event_del
>                ACQUIRE(th_base_lock)
>                event_del_internal(event ev)
>                   base = ev->ev_base;
>                   if (base->current_event == ev )
>                        EVTHREAD_COND_WAIT(base->**current_event_cond,
> base->th_base_lock);
>
>
>
> 2)
>  event_base_dispatch
>  event_base_loop
>     EVBASE_ACQUIRE_LOCK(base, th_base_lock);
>     event_process_active
>       event_process_active_single_**queue
>          base->current_event = ev;
>          EVBASE_RELEASE_LOCK(base, th_base_lock)
>          USER_CB
>             bufferevent_writecb
>               _bufferevent_incref_and_lock(**bufferevent bufev)
>                   BEV_LOCK(bufev);
>
> Any idea how to fix this? I can't see a way out.
>

What if you free the bufferevent on the thread running event_base_dispatch?

Create a new event with a callback that free that bufferevent. Make it
active from the thread that call bufferevent_free today.

Reply via email to