On Thu, Feb 19, 2015 at 6:29 AM, Azat Khuzhin <[email protected]> wrote:

> On Thu, Feb 19, 2015 at 06:22:03AM -0800, Marco Graziano wrote:
> > On Thu, Feb 19, 2015 at 5:08 AM, Azat Khuzhin <[email protected]>
> wrote:
> >
> > > On Wed, Feb 18, 2015 at 03:56:11PM -0800, Marco Graziano wrote:
> > > > On Wed, Feb 18, 2015 at 8:43 AM, Marco Graziano <
> [email protected]>
> > > > wrote:
> > > >
> > > > > On Tue, Feb 17, 2015 at 4:32 AM, Azat Khuzhin <[email protected]
> >
> > > wrote:
> > > > >
> > > > >> On Thu, Feb 12, 2015 at 09:16:20AM -0800, Marco Graziano wrote:
> > > > >> > Hi Azat,
> > > > >>
> > > > >> Hi Marco,
> > > > >>
> > > > >> >
> > > > >> > The backtraces I included are from the only threads waiting
> while
> > > > >> holding a
> > > > >> > lock. The other threads are providing convincing signs of good
> > > health in
> > > > >> > the trace log.
> > > > >>
> > > > >> Could you please post backstraces for all available threads?
> > > > >> (I hope you don't have 100 of them).
> > > > >
> > > > >
> > > > Hi Azat,
> > > >
> > > > As an experiment, I removed all the additional locks and set only the
> > > flags
> > > >  BEV_OPT_CLOSE_ON_FREE |  BEV_OPT_THREADSAFE
> > > >
> > > > I get very quickly into a deadlock with the following backtraces:
> > > >
> > > > (gdb) bt
> > > > #0  __lll_lock_wait () at
> > > > ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> > > > #1  0x00007ffff7770672 in _L_lock_953 () from
> > > > /lib/x86_64-linux-gnu/libpthread.so.0
> > > > #2  0x00007ffff77704da in __GI___pthread_mutex_lock (mutex=0x6399f0)
> at
> > > > ../nptl/pthread_mutex_lock.c:114
> > >
> > > Hi Marco,
> > >
> > > First of all, could post exact commit's hash, to make it specific.
> > >
> >
> > The version I am using is from
> >
> http://sourceforge.net/projects/levent/files/libevent/libevent-2.1/libevent-2.1.5.beta.tar.gz/download
> >
> >
> > >
> > > Could you check you program under valgrind memcheck/{drd,helgrind} (but
> > > with you internal locks, to reduce amount of lines in report) ?
> > > Maybe you forgot to release lock somewhere else...
> > >
> >
> > I have removed all my internal locks.
> >
> > The flags used are (BEV_OPT_CLOSE_ON_FREE | BEV_OPT_THREADSAFE |
> > BEV_OPT_DEFER_CALLBACKS).
> >
> > I enable locking:
> >
> > evbuffer_enable_locking(bufferevent_get_input(connection->bufev), NULL);
> > evbuffer_enable_locking(bufferevent_get_output(connection->bufev), NULL);
> >
> > The full backtrace after deadlock is:
> >
> >
> > (gdb) thread apply all bt
> >
> > Thread 13 (Thread 0x7fffcf7fe700 (LWP 23500)):
> > #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
> > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
> > #1  0x00007ffff6d76292 in _workqueue_job_scheduler (data=0x61f430) at
> > workqueue.c:389
> > #2  0x00007ffff776e182 in start_thread (arg=0x7fffcf7fe700) at
> > pthread_create.c:312
> > #3  0x00007ffff689700d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >
> > Thread 12 (Thread 0x7fffcffff700 (LWP 23499)):
> > #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
> > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
> > #1  0x00007ffff6d76292 in _workqueue_job_scheduler (data=0x61f280) at
> > workqueue.c:389
> > #2  0x00007ffff776e182 in start_thread (arg=0x7fffcffff700) at
> > pthread_create.c:312
> > #3  0x00007ffff689700d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >
> > Thread 11 (Thread 0x7fffecff9700 (LWP 23498)):
> > #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
> > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
> > #1  0x00007ffff6d76292 in _workqueue_job_scheduler (data=0x61f0d0) at
> > workqueue.c:389
> > #2  0x00007ffff776e182 in start_thread (arg=0x7fffecff9700) at
> > pthread_create.c:312
> > #3  0x00007ffff689700d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >
> > Thread 10 (Thread 0x7fffed7fa700 (LWP 23497)):
> > #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
> > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
> > #1  0x00007ffff6d76292 in _workqueue_job_scheduler (data=0x61ef20) at
> > workqueue.c:389
> > #2  0x00007ffff776e182 in start_thread (arg=0x7fffed7fa700) at
> > pthread_create.c:312
> > #3  0x00007ffff689700d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >
> > Thread 9 (Thread 0x7fffedffb700 (LWP 23496)):
> > #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
> > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
> > #1  0x00007ffff6d76292 in _workqueue_job_scheduler (data=0x61ed70) at
> > workqueue.c:389
> > ---Type <return> to continue, or q <return> to quit---
> > #2  0x00007ffff776e182 in start_thread (arg=0x7fffedffb700) at
> > pthread_create.c:312
> > #3  0x00007ffff689700d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >
> > Thread 8 (Thread 0x7fffee7fc700 (LWP 23495)):
> > #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
> > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
> > #1  0x00007ffff6d76292 in _workqueue_job_scheduler (data=0x61ebc0) at
> > workqueue.c:389
> > #2  0x00007ffff776e182 in start_thread (arg=0x7fffee7fc700) at
> > pthread_create.c:312
> > #3  0x00007ffff689700d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >
> > Thread 7 (Thread 0x7fffeeffd700 (LWP 23494)):
> > #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
> > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
> > #1  0x00007ffff6d76292 in _workqueue_job_scheduler (data=0x61ea10) at
> > workqueue.c:389
> > #2  0x00007ffff776e182 in start_thread (arg=0x7fffeeffd700) at
> > pthread_create.c:312
> > #3  0x00007ffff689700d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >
> > Thread 6 (Thread 0x7fffef7fe700 (LWP 23493)):
> > #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
> > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
> > #1  0x00007ffff6d76292 in _workqueue_job_scheduler (data=0x61e860) at
> > workqueue.c:389
> > #2  0x00007ffff776e182 in start_thread (arg=0x7fffef7fe700) at
> > pthread_create.c:312
> > #3  0x00007ffff689700d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >
> > Thread 5 (Thread 0x7fffeffff700 (LWP 23492)):
> > #0  __lll_lock_wait () at
> > ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> > #1  0x00007ffff7770672 in _L_lock_953 () from
> > /lib/x86_64-linux-gnu/libpthread.so.0
> > #2  0x00007ffff77704da in __GI___pthread_mutex_lock (mutex=0xf36860) at
> > ../nptl/pthread_mutex_lock.c:114
> > #3  0x00007ffff7b9d445 in bufferevent_setcb (bufev=0xf44030, readcb=0x0,
> > writecb=0x4046f8 <device_write_cb>, eventcb=0x0, cbarg=0x642d00)
> >     at bufferevent.c:372
> > #4  0x000000000040411f in device_chunk_cb (device=0x642d00) at
> cometa.c:474
> > ---Type <return> to continue, or q <return> to quit---
> > #5  0x000000000040454d in device_read_cb (bev=0x642730, arg=0x642d00) at
> > cometa.c:612
> > #6  0x00007ffff7b9dd1c in bufferevent_run_deferred_callbacks_locked
> > (cb=<optimized out>, arg=0x642730) at bufferevent.c:150
> > #7  0x00007ffff7ba77ff in event_process_active_single_queue
> > (base=base@entry=0x7fffe4000900,
> > activeq=0x7fffe4000d50,
> >     max_to_process=max_to_process@entry=2147483647,
> endtime=endtime@entry=0x0)
> > at event.c:1626
> > #8  0x00007ffff7ba82cf in event_process_active (base=0x7fffe4000900) at
> > event.c:1689
> > #9  event_base_loop (base=0x7fffe4000900, flags=0) at event.c:1912
> > #10 0x000000000040a00e in device_loop (arg=0x0) at cometa.c:1450
> > #11 0x00007ffff776e182 in start_thread (arg=0x7fffeffff700) at
> > pthread_create.c:312
> > #12 0x00007ffff689700d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >
> > Thread 4 (Thread 0x7ffff4b2e700 (LWP 23491)):
> > #0  0x00007ffff68976a3 in epoll_wait () at
> > ../sysdeps/unix/syscall-template.S:81
> > #1  0x00007ffff7bb19a8 in epoll_dispatch (base=0x7fffe0000900,
> > tv=<optimized out>) at epoll.c:463
> > #2  0x00007ffff7ba8106 in event_base_loop (base=0x7fffe0000900, flags=0)
> at
> > event.c:1898
> > #3  0x000000000040a00e in device_loop (arg=0x0) at cometa.c:1450
> > #4  0x00007ffff776e182 in start_thread (arg=0x7ffff4b2e700) at
> > pthread_create.c:312
> > #5  0x00007ffff689700d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >
> > Thread 3 (Thread 0x7ffff532f700 (LWP 23490)):
> > #0  0x00007ffff68976a3 in epoll_wait () at
> > ../sysdeps/unix/syscall-template.S:81
> > #1  0x00007ffff7bb19a8 in epoll_dispatch (base=0x7fffe8000900,
> > tv=<optimized out>) at epoll.c:463
> > #2  0x00007ffff7ba8106 in event_base_loop (base=0x7fffe8000900, flags=0)
> at
> > event.c:1898
> > #3  0x000000000040a00e in device_loop (arg=0x0) at cometa.c:1450
> > #4  0x00007ffff776e182 in start_thread (arg=0x7ffff532f700) at
> > pthread_create.c:312
> > #5  0x00007ffff689700d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >
> > Thread 2 (Thread 0x7ffff5b30700 (LWP 23489)):
> > #0  0x00007ffff68976a3 in epoll_wait () at
> > ../sysdeps/unix/syscall-template.S:81
> > ---Type <return> to continue, or q <return> to quit---
> > #1  0x00007ffff7bb19a8 in epoll_dispatch (base=0x7ffff0000900,
> > tv=<optimized out>) at epoll.c:463
> > #2  0x00007ffff7ba8106 in event_base_loop (base=0x7ffff0000900, flags=0)
> at
> > event.c:1898
> > #3  0x000000000040a00e in device_loop (arg=0x0) at cometa.c:1450
> > #4  0x00007ffff776e182 in start_thread (arg=0x7ffff5b30700) at
> > pthread_create.c:312
> > #5  0x00007ffff689700d in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >
> > Thread 1 (Thread 0x7ffff7fdd780 (LWP 23485)):
> > #0  __lll_lock_wait () at
> > ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> > #1  0x00007ffff7770672 in _L_lock_953 () from
> > /lib/x86_64-linux-gnu/libpthread.so.0
> > #2  0x00007ffff77704da in __GI___pthread_mutex_lock (mutex=0x642a50) at
> > ../nptl/pthread_mutex_lock.c:114
> > #3  0x00007ffff7b9686e in evbuffer_get_length (buffer=0x642930) at
> > buffer.c:610
> > #4  0x0000000000404782 in device_write_cb (bev=0xf44030, arg=0x642d00) at
> > cometa.c:677
> > #5  0x00007ffff7b9dd46 in bufferevent_run_deferred_callbacks_locked
> > (cb=<optimized out>, arg=0xf44030) at bufferevent.c:154
> > #6  0x00007ffff7ba77ff in event_process_active_single_queue
> > (base=base@entry=0x6218d0,
> > activeq=0x621d20,
> >     max_to_process=max_to_process@entry=2147483647,
> endtime=endtime@entry=0x0)
> > at event.c:1626
> > #7  0x00007ffff7ba82cf in event_process_active (base=0x6218d0) at
> > event.c:1689
> > #8  event_base_loop (base=0x6218d0, flags=0) at event.c:1912
> > #9  0x000000000040a50f in main () at cometa.c:1584
> >
> > Thanks,
> >
> > -Marco G.
> >
> > >
> > > > #3  0x00007ffff7b9686e in evbuffer_get_length (buffer=0x6398d0) at
> > > > buffer.c:610
> > > > #4  0x0000000000404723 in device_write_cb (bev=0x64f350,
> arg=0x639ca0) at
> > > > cometa.c:672
> > > > #5  0x00007ffff7ba2139 in bufferevent_trigger_nolock_ (options=0,
> > > iotype=4,
> > > > bufev=0x64f350) at bufferevent-internal.h:369
> > > > #6  bufferevent_writecb (fd=<optimized out>, event=<optimized out>,
> > > > arg=0x64f350) at bufferevent_sock.c:297
> > > > #7  0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized out>,
> > > > base=0x6218d0) at event.c:1531
> > > > #8  event_process_active_single_queue (base=base@entry=0x6218d0,
> > > > activeq=0x621d20, max_to_process=max_to_process@entry=2147483647,
> > > >     endtime=endtime@entry=0x0) at event.c:1590
> > > > #9  0x00007ffff7ba82cf in event_process_active (base=0x6218d0) at
> > > > event.c:1689
> > > > #10 event_base_loop (base=0x6218d0, flags=0) at event.c:1912
> > > > #11 0x000000000040a4b0 in main () at cometa.c:1579
> > > > (gdb) t 2
> > > > [Switching to thread 2 (Thread 0x7ffff5b30700 (LWP 28878))]
> > > > #0  0x00007ffff68976a3 in epoll_wait () at
> > > > ../sysdeps/unix/syscall-template.S:81
> > > > 81 ../sysdeps/unix/syscall-template.S: No such file or directory.
> > > > (gdb) bt
> > > > #0  0x00007ffff68976a3 in epoll_wait () at
> > > > ../sysdeps/unix/syscall-template.S:81
> > > > #1  0x00007ffff7bb19a8 in epoll_dispatch (base=0x7ffff0000900,
> > > > tv=<optimized out>) at epoll.c:463
> > > > #2  0x00007ffff7ba8106 in event_base_loop (base=0x7ffff0000900,
> flags=0)
> > > at
> > > > event.c:1898
> > > > #3  0x0000000000409faf in device_loop (arg=0x0) at cometa.c:1445
> > > > #4  0x00007ffff776e182 in start_thread (arg=0x7ffff5b30700) at
> > > > pthread_create.c:312
> > > > #5  0x00007ffff689700d in clone () at
> > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> > > > (gdb) t 3
> > > > [Switching to thread 3 (Thread 0x7ffff532f700 (LWP 28879))]
> > > > #0  0x00007ffff68976a3 in epoll_wait () at
> > > > ../sysdeps/unix/syscall-template.S:81
> > > > 81 in ../sysdeps/unix/syscall-template.S
> > > > (gdb) bt
> > > > #0  0x00007ffff68976a3 in epoll_wait () at
> > > > ../sysdeps/unix/syscall-template.S:81
> > > > #1  0x00007ffff7bb19a8 in epoll_dispatch (base=0x7fffe8000900,
> > > > tv=<optimized out>) at epoll.c:463
> > > > #2  0x00007ffff7ba8106 in event_base_loop (base=0x7fffe8000900,
> flags=0)
> > > at
> > > > event.c:1898
> > > > #3  0x0000000000409faf in device_loop (arg=0x0) at cometa.c:1445
> > > > #4  0x00007ffff776e182 in start_thread (arg=0x7ffff532f700) at
> > > > pthread_create.c:312
> > > > #5  0x00007ffff689700d in clone () at
> > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> > > > (gdb) t 4
> > > > [Switching to thread 4 (Thread 0x7ffff4b2e700 (LWP 28880))]
> > > > #0  0x00007ffff68976a3 in epoll_wait () at
> > > > ../sysdeps/unix/syscall-template.S:81
> > > > 81 in ../sysdeps/unix/syscall-template.S
> > > > (gdb) bt
> > > > #0  0x00007ffff68976a3 in epoll_wait () at
> > > > ../sysdeps/unix/syscall-template.S:81
> > > > #1  0x00007ffff7bb19a8 in epoll_dispatch (base=0x7fffc0000900,
> > > > tv=<optimized out>) at epoll.c:463
> > > > #2  0x00007ffff7ba8106 in event_base_loop (base=0x7fffc0000900,
> flags=0)
> > > at
> > > > event.c:1898
> > > > #3  0x0000000000409faf in device_loop (arg=0x0) at cometa.c:1445
> > > > #4  0x00007ffff776e182 in start_thread (arg=0x7ffff4b2e700) at
> > > > pthread_create.c:312
> > > > #5  0x00007ffff689700d in clone () at
> > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> > > > (gdb) t 5
> > > > [Switching to thread 5 (Thread 0x7fffeffff700 (LWP 28881))]
> > > > #0  __lll_lock_wait () at
> > > > ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> > > > 135 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such
> file
> > > or
> > > > directory.
> > > > (gdb) bt
> > > > #0  __lll_lock_wait () at
> > > > ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> > > > #1  0x00007ffff7770672 in _L_lock_953 () from
> > > > /lib/x86_64-linux-gnu/libpthread.so.0
> > > > #2  0x00007ffff77704da in __GI___pthread_mutex_lock (mutex=0x64b320)
> at
> > > > ../nptl/pthread_mutex_lock.c:114
> > > > #3  0x00007ffff7b9d445 in bufferevent_setcb (bufev=0x64f350,
> > > > readcb=0x405022 <evhttp_read_cb>, writecb=0x4051ce <evhttp_write_cb>,
> > > >     eventcb=0x405129 <evhttp_event_cb>, cbarg=0x64e690) at
> > > bufferevent.c:372
> > > > #4  0x0000000000405ecb in send_http_response (connection=0x64e690,
> > > > response_buffer=0x7fffbc0008c0) at cometa.c:977
> > >
> > > What this function do?
> > > But let's start from valgrind report first.
> > >
> > > > #5  0x00000000004040ef in device_chunk_cb (device=0x639ca0) at
> > > cometa.c:471
> > >
> > > Hang on..
> > >
> > > Do you lock this device->pending->bufev somehow (or in some inner
> frame)?
> > > If so then *deadlock is possible*, iff in thread#1
> evbuffer_get_length()
> > > called with this bufev output, and this is the only issue that can
> > > happens here, if I understand you snippets+backtraces correctly.
>
> You read the paragraph above, do you?
>

Yes I did. I imagine you refer to additional locks. I removed all my locks
from everywhere.

Thanks,

-Marco G.

Anyway you new backtrace don't have answer for this question.
>
> Cheers,
> Azat.
>
> > >
> > > > #6  0x00000000004044ee in device_read_cb (bev=0x6396d0,
> arg=0x639ca0) at
> > > > cometa.c:607
> > > > #7  0x00007ffff7ba2324 in bufferevent_trigger_nolock_ (options=0,
> > > iotype=2,
> > > > bufev=0x6396d0) at bufferevent-internal.h:366
> > > > #8  bufferevent_readcb (fd=35, event=<optimized out>, arg=0x6396d0)
> at
> > > > bufferevent_sock.c:187
> > > > #9  0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized out>,
> > > > base=0x7fffbc000900) at event.c:1531
> > > > #10 event_process_active_single_queue (base=base@entry
> =0x7fffbc000900,
> > > > activeq=0x7fffbc000d50,
> > > >     max_to_process=max_to_process@entry=2147483647,
> > > endtime=endtime@entry=0x0)
> > > > at event.c:1590
> > > > #11 0x00007ffff7ba82cf in event_process_active (base=0x7fffbc000900)
> at
> > > > event.c:1689
> > > > #12 event_base_loop (base=0x7fffbc000900, flags=0) at event.c:1912
> > > > #13 0x0000000000409faf in device_loop (arg=0x0) at cometa.c:1445
> > > > #14 0x00007ffff776e182 in start_thread (arg=0x7fffeffff700) at
> > > > pthread_create.c:312
> > > > #15 0x00007ffff689700d in clone () at
> > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> > >
> > > (gdb) thread apply all bt
> > >
> > > >
> > > > Adding the BEV_OPT_DEFER_CALLBACKS seems to improve considerably and
> I
> > > > haven't got a deadlock yet.
> > > >
> > > > At some point I would have to put some locks back in for
> synchronization
> > > > though. I have occasional buffer corruption possibly from interleaved
> > > read
> > > > callbacks from different threads.
> > > >
> > > > Thanks,
> > > >
> > > > -Marco G.
> > > >
> > > >
> > > >
> > > > > Hi Azat,
> > > > >
> > > > > The other 2 threads are idle:
> > > > >
> > > > > #0  0x00007ffff68976a3 in epoll_wait () at
> > > > > ../sysdeps/unix/syscall-template.S:81
> > > > > #1  0x00007ffff7bb19a8 in epoll_dispatch (base=0x6218d0,
> tv=<optimized
> > > > > out>) at epoll.c:463
> > > > > #2  0x00007ffff7ba8106 in event_base_loop (base=0x6218d0, flags=0)
> at
> > > > > event.c:1898
> > > > > #3  0x000000000040a763 in main () at cometa.c:1583
> > > > >
> > > > >
> > > > >> >
> > > > >> > Definitely the write callback was called tens of thousand of
> times
> > > > >> before
> > > > >> > incurring in the deadlock, and I suspect luck or chance has
> little
> > > to do
> > > > >> > with the deadlock. It may have more to do with the fact that the
> > > read
> > > > >> > callback receiving a message send it down a bufferevent with the
> > > write
> > > > >> > callback handled by another thread?
> > > > >>
> > > > >
> > > > > Any implications having the read callback receiving a message than
> is
> > > then
> > > > > sent to a bufferevent and the write callback for that message
> being on
> > > a
> > > > > different thread?
> > >
> > > I can't imagine any problems with this use case.
> > >
> > > Even with your internal locks, the *first* lock that must be acquired
> --
> > > bufferevent lock, and it will block if some of threads already holding
> > > it (there were some issues with freeing bufferevents, but they was
> > > before finalizers, hence in 2.1 upstream they mustn't exist anymore).
> > >
> > > Cheers,
> > > Azat.
> > >
> > > > >
> > > > >
> > > > >> >
> > > > >> > The fact is that after adding BEV_OPT_UNLOCK_CALLBACKS the
> deadlock
> > > has
> > > > >> not
> > > > >> > appeared again. Before it took about one hour of stress test to
> > > appear.
> > > > >> If
> > > > >> > removing the callback lock solved the problem, please help me
> > > understand
> > > > >> > what was happening in the first place and if for instance the
> added
> > > lock
> > > > >> > inside the callback was unnecessary or even the cause of the
> > > deadlock
> > > > >> > itself.
> > > > >
> > > > >
> > > > > After additional testing, I confirm that the deadlock has not
> appeared
> > > > > when setting BEV_OPT_UNLOCK_CALLBACKS.
> > > > >
> > > > > >
> > > > >> > Also, since enabling OpenSSL is coming next, I was wondering if
> you
> > > > >> have a
> > > > >> > word of wisdom or two, on how to better tackle the task at
> hands or
> > > > >> which
> > > > >> > version of libevent is best suited for the job. I am using the
> head
> > > of
> > > > >> the
> > > > >> > tree, or to do some debugging if that it was it takes, I would
> be
> > > okay
> > > > >> with
> > > > >> > it. I am afraid that locking the callbacks may be needed with
> > > OpenSSL
> > > > >> > enabled.
> > > > >>
> > > > >> I prefer to use the latest versions, to be up-to-date.
> > > > >> bufferevent_openssl doesn't require bufferevent locking - look at
> > > > >> bufferevent_openssl_new_impl() in bufferevent_openssl.c
> > > > >>
> > > > >
> > > > > Hence using  BEV_OPT_UNLOCK_CALLBACKS seems the way to go anyway.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > -Marco G.
> > > > >
> > > > > >
> > > > >> > Thanks for the useful insights.
> > > > >> >
> > > > >> > -Marco G.
> > > > >>
> > > > >> And for the next time: could you avoid top-posting please.
> > > > >>
> > > > >
> > > > > ps. Sorry for that. Using GMail makes it too easy to forget.
> > > > >
> > > > >
> > > > >>
> > > > >> Cheers,
> > > > >> Azat.
> > > > >>
> > > > >> >
> > > > >> > On Wed, Feb 11, 2015 at 3:11 PM, Azat Khuzhin <
> [email protected]>
> > > > >> wrote:
> > > > >> >
> > > > >> > > On Tue, Feb 10, 2015 at 09:06:40PM -0800, Marco Graziano
> wrote:
> > > > >> > > > I am running into a deadlock with both libevent 2.0 and 2.1
> > > that I
> > > > >> cannot
> > > > >> > > > explain. One thread is executing a write callback and the
> other
> > > > >> thread a
> > > > >> > > > read callback of a different bufferevent. Since they are
> both
> > > > >> accessing
> > > > >> > > the
> > > > >> > > > same data structure passed to the callbacks (arg=0x630c40),
> I am
> > > > >> using a
> > > > >> > > > rwlock in the data structure. I did not expect though the
> > > evbuffer
> > > > >> > > internal
> > > > >> > > > lock to interfere with the lock used to synchronize access
> to
> > > the
> > > > >> data
> > > > >> > > > structure shared between the callbacks.
> > > > >> > > >
> > > > >> > > > I am including the backtraces and the relevant portions of
> > > code. I
> > > > >> would
> > > > >> > > > appreciate if someone can give me any suggestion on how to
> work
> > > > >> around
> > > > >> > > this.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > >
> > > > >> > > > -Marco G.
> > > > >> > > >
> > > > >> > > > (gdb) t 1
> > > > >> > > > [Switching to thread 1 (Thread 0x7ffff7fdd780 (LWP 12008))]
> > > > >> > > > (gdb) bt
> > > > >> > > > #0  __lll_lock_wait () at
> > > > >> > > > ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> > > > >> > > > #1  0x00007ffff7770672 in _L_lock_953 () from
> > > > >> > > > /lib/x86_64-linux-gnu/libpthread.so.0
> > > > >> > > > #2  0x00007ffff77704da in __GI___pthread_mutex_lock
> > > > >> (mutex=0x620200) at
> > > > >> > > > ../nptl/pthread_mutex_lock.c:114
> > > > >> > > > #3  0x00007ffff7b9686e in evbuffer_get_length
> (buffer=0x630930)
> > > at
> > > > >> > > > buffer.c:610
> > > > >> > >
> > > > >> > > I don't sure that this two backtraces could have deadlock. Do
> you?
> > > > >> > > Could it call write callback again and again, but you was so
> lucky
> > > > >> that
> > > > >> > > you always see such backtrace?
> > > > >> > >
> > > > >> > > Maybe some other thread acquire some locks in other order?
> > > > >> > > (I bet to this one).
> > > > >> > >
> > > > >> > > Also did you check program under memcheck/drd/helgrind?
> > > > >> > > Again if you was so *lucky* that you overwrite lock of
> evbuffer by
> > > > >> your
> > > > >> > > own, but it is so unlikely, that you could just ignore this.
> > > > >> > >
> > > > >> > > And if you could write 50-100L sample to reproduce the issue,
> it
> > > will
> > > > >> be
> > > > >> > > very helpful.
> > > > >> > >
> > > > >> > > Cheers,
> > > > >> > > Azat.
> > > > >> > >
> > > > >> > > > #4  0x000000000040494b in device_write_cb (bev=0x123f640,
> > > > >> arg=0x630c40)
> > > > >> > > at
> > > > >> > > > cometa.c:667
> > > > >> > > > #5  0x00007ffff7ba2139 in bufferevent_trigger_nolock_
> > > (options=0,
> > > > >> > > iotype=4,
> > > > >> > > > bufev=0x123f640) at bufferevent-internal.h:369
> > > > >> > > > #6  bufferevent_writecb (fd=<optimized out>,
> event=<optimized
> > > out>,
> > > > >> > > > arg=0x123f640) at bufferevent_sock.c:297
> > > > >> > > > #7  0x00007ffff7ba7a19 in event_persist_closure
> (ev=<optimized
> > > out>,
> > > > >> > > > base=0x6218d0) at event.c:1531
> > > > >> > > > #8  event_process_active_single_queue (base=base@entry
> > > =0x6218d0,
> > > > >> > > > activeq=0x621d20, max_to_process=max_to_process@entry=
> > > 2147483647,
> > > > >> > > >     endtime=endtime@entry=0x0) at event.c:1590
> > > > >> > > > #9  0x00007ffff7ba82cf in event_process_active
> (base=0x6218d0)
> > > at
> > > > >> > > > event.c:1689
> > > > >> > > > #10 event_base_loop (base=0x6218d0, flags=0) at event.c:1912
> > > > >> > > > #11 0x000000000040a700 in main () at cometa.c:1571
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > bufferevent 0x123f640 write callback:
> > > > >> > > > 665    pthread_rwlock_wrlock(&device->lock);
> > > > >> > > > 666    if ((device->pending && device->state ==
> WAITING_STATE)
> > > ||
> > > > >> > > > device->ws_pending) {
> > > > >> > > > 667     int l = evbuffer_get_length(device_bev);
> > > > >>
> > > > >> How device_dev is installed?
> > > > >> (evbuffer_new()/evbuffer_enable_locking())
> > > > >>
> > > > >> > > > 668     pthread_rwlock_unlock(&device->lock);
> > > > >> > > > 669     output =
> bufferevent_get_output(device->pending->bufev);
> > > > >> > > > 670 if (evbuffer_get_length(output) == 0) {
> > > > >> > > > 671 /* free and close the connection */
> > > > >> > > > 672 pthread_rwlock_wrlock(&device->lock);
> > > > >> > > > 673     free_http_connection(device->pending);
> > > > >> > > > 674     device->pending = NULL;
> > > > >> > > > 675     device->state = IDLE_STATE;
> > > > >> > > > 676   device->chunk_length = 0;
> > > > >> > > > 677     device->ntoread = -1;
> > > > >> > > > 678
> > > > >> > > > 679     evbuffer_drain (device_bev, l);
> > > > >> > > > 680   sem_post(&device->active);
> > > > >> > > > 681   pthread_rwlock_unlock(&device->lock);
> > > > >> > > > 682     }
> > > > >> > > > 683    } else {
> > > > >> > > > 684     pthread_rwlock_unlock(&device->lock);
> > > > >> > > > 685    }
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > (gdb) t 3
> > > > >> > > > [Switching to thread 3 (Thread 0x7ffff532f700 (LWP 12013))]
> > > > >> > > > (gdb) bt
> > > > >> > > > #0  pthread_rwlock_wrlock () at
> > > > >> > > >
> > > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:85
> > > > >> > > > #1  0x0000000000404667 in device_read_cb (bev=0x630730,
> > > > >> arg=0x630c40) at
> > > > >> > > > cometa.c:604
> > > > >> > > > #2  0x00007ffff7ba2324 in bufferevent_trigger_nolock_
> > > (options=0,
> > > > >> > > iotype=2,
> > > > >> > > > bufev=0x630730) at bufferevent-internal.h:366
> > > > >> > > > #3  bufferevent_readcb (fd=34, event=<optimized out>,
> > > arg=0x630730)
> > > > >> at
> > > > >> > > > bufferevent_sock.c:187
> > > > >> > > > #4  0x00007ffff7ba7a19 in event_persist_closure
> (ev=<optimized
> > > out>,
> > > > >> > > > base=0x7fffe8000900) at event.c:1531
> > > > >> > > > #5  event_process_active_single_queue (base=base@entry
> > > > >> =0x7fffe8000900,
> > > > >> > > > activeq=0x7fffe8000d50,
> > > > >> > > >     max_to_process=max_to_process@entry=2147483647,
> > > > >> > > endtime=endtime@entry=0x0)
> > > > >> > > > at event.c:1590
> > > > >> > > > #6  0x00007ffff7ba82cf in event_process_active
> > > > >> (base=0x7fffe8000900) at
> > > > >> > > > event.c:1689
> > > > >> > > > #7  event_base_loop (base=0x7fffe8000900, flags=0) at
> > > event.c:1912
> > > > >> > > > #8  0x000000000040a1ff in device_loop (arg=0x0) at
> cometa.c:1437
> > > > >> > > > #9  0x00007ffff776e182 in start_thread (arg=0x7ffff532f700)
> at
> > > > >> > > > pthread_create.c:312
> > > > >> > > > #10 0x00007ffff689700d in clone () at
> > > > >> > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> > > > >> > > >
> > > > >> > > > bufferevent 0x630730 read call back:
> > > > >> > > > 604 pthread_rwlock_wrlock(&device->lock);
> > > > >> > > > 605    if (device->state != WAITING_STATE) {
> > > > >> > > > 606
> > > > >> > > > 607   device->chunk_length = 0;
> > > > >> > > > 608     device->ntoread = -1;
> > > > >> > > > 609     /* in case of state == WAITING_STATE the counters
> reset
> > > is
> > > > >> done
> > > > >> > > in
> > > > >> > > > the write callback after sending the response */
> > > > >> > > > 610    }
> > > > >> > > > 611 pthread_rwlock_unlock(&device->lock);
> > > > >>
> > > ***********************************************************************
> > > > >> To unsubscribe, send an e-mail to [email protected] with
> > > > >> unsubscribe libevent-users    in the body.
> > > > >>
> > > > >
> > > > >
> > > ***********************************************************************
> > > To unsubscribe, send an e-mail to [email protected] with
> > > unsubscribe libevent-users    in the body.
> > >
>
> --
> Respectfully
> Azat Khuzhin
> ***********************************************************************
> To unsubscribe, send an e-mail to [email protected] with
> unsubscribe libevent-users    in the body.
>

Reply via email to