On Thu, Feb 19, 2015 at 06:22:03AM -0800, Marco Graziano wrote: > On Thu, Feb 19, 2015 at 5:08 AM, Azat Khuzhin <[email protected]> wrote: > > > On Wed, Feb 18, 2015 at 03:56:11PM -0800, Marco Graziano wrote: > > > On Wed, Feb 18, 2015 at 8:43 AM, Marco Graziano <[email protected]> > > > wrote: > > > > > > > On Tue, Feb 17, 2015 at 4:32 AM, Azat Khuzhin <[email protected]> > > wrote: > > > > > > > >> On Thu, Feb 12, 2015 at 09:16:20AM -0800, Marco Graziano wrote: > > > >> > Hi Azat, > > > >> > > > >> Hi Marco, > > > >> > > > >> > > > > >> > The backtraces I included are from the only threads waiting while > > > >> holding a > > > >> > lock. The other threads are providing convincing signs of good > > health in > > > >> > the trace log. > > > >> > > > >> Could you please post backstraces for all available threads? > > > >> (I hope you don't have 100 of them). > > > > > > > > > > > Hi Azat, > > > > > > As an experiment, I removed all the additional locks and set only the > > flags > > > BEV_OPT_CLOSE_ON_FREE | BEV_OPT_THREADSAFE > > > > > > I get very quickly into a deadlock with the following backtraces: > > > > > > (gdb) bt > > > #0 __lll_lock_wait () at > > > ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 > > > #1 0x00007ffff7770672 in _L_lock_953 () from > > > /lib/x86_64-linux-gnu/libpthread.so.0 > > > #2 0x00007ffff77704da in __GI___pthread_mutex_lock (mutex=0x6399f0) at > > > ../nptl/pthread_mutex_lock.c:114 > > > > Hi Marco, > > > > First of all, could post exact commit's hash, to make it specific. > > > > The version I am using is from > http://sourceforge.net/projects/levent/files/libevent/libevent-2.1/libevent-2.1.5.beta.tar.gz/download > > > > > > Could you check you program under valgrind memcheck/{drd,helgrind} (but > > with you internal locks, to reduce amount of lines in report) ? > > Maybe you forgot to release lock somewhere else... > > > > I have removed all my internal locks. > > The flags used are (BEV_OPT_CLOSE_ON_FREE | BEV_OPT_THREADSAFE | > BEV_OPT_DEFER_CALLBACKS). > > I enable locking: > > evbuffer_enable_locking(bufferevent_get_input(connection->bufev), NULL); > evbuffer_enable_locking(bufferevent_get_output(connection->bufev), NULL); > > The full backtrace after deadlock is: > > > (gdb) thread apply all bt > > Thread 13 (Thread 0x7fffcf7fe700 (LWP 23500)): > #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 > #1 0x00007ffff6d76292 in _workqueue_job_scheduler (data=0x61f430) at > workqueue.c:389 > #2 0x00007ffff776e182 in start_thread (arg=0x7fffcf7fe700) at > pthread_create.c:312 > #3 0x00007ffff689700d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > Thread 12 (Thread 0x7fffcffff700 (LWP 23499)): > #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 > #1 0x00007ffff6d76292 in _workqueue_job_scheduler (data=0x61f280) at > workqueue.c:389 > #2 0x00007ffff776e182 in start_thread (arg=0x7fffcffff700) at > pthread_create.c:312 > #3 0x00007ffff689700d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > Thread 11 (Thread 0x7fffecff9700 (LWP 23498)): > #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 > #1 0x00007ffff6d76292 in _workqueue_job_scheduler (data=0x61f0d0) at > workqueue.c:389 > #2 0x00007ffff776e182 in start_thread (arg=0x7fffecff9700) at > pthread_create.c:312 > #3 0x00007ffff689700d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > Thread 10 (Thread 0x7fffed7fa700 (LWP 23497)): > #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 > #1 0x00007ffff6d76292 in _workqueue_job_scheduler (data=0x61ef20) at > workqueue.c:389 > #2 0x00007ffff776e182 in start_thread (arg=0x7fffed7fa700) at > pthread_create.c:312 > #3 0x00007ffff689700d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > Thread 9 (Thread 0x7fffedffb700 (LWP 23496)): > #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 > #1 0x00007ffff6d76292 in _workqueue_job_scheduler (data=0x61ed70) at > workqueue.c:389 > ---Type <return> to continue, or q <return> to quit--- > #2 0x00007ffff776e182 in start_thread (arg=0x7fffedffb700) at > pthread_create.c:312 > #3 0x00007ffff689700d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > Thread 8 (Thread 0x7fffee7fc700 (LWP 23495)): > #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 > #1 0x00007ffff6d76292 in _workqueue_job_scheduler (data=0x61ebc0) at > workqueue.c:389 > #2 0x00007ffff776e182 in start_thread (arg=0x7fffee7fc700) at > pthread_create.c:312 > #3 0x00007ffff689700d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > Thread 7 (Thread 0x7fffeeffd700 (LWP 23494)): > #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 > #1 0x00007ffff6d76292 in _workqueue_job_scheduler (data=0x61ea10) at > workqueue.c:389 > #2 0x00007ffff776e182 in start_thread (arg=0x7fffeeffd700) at > pthread_create.c:312 > #3 0x00007ffff689700d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > Thread 6 (Thread 0x7fffef7fe700 (LWP 23493)): > #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 > #1 0x00007ffff6d76292 in _workqueue_job_scheduler (data=0x61e860) at > workqueue.c:389 > #2 0x00007ffff776e182 in start_thread (arg=0x7fffef7fe700) at > pthread_create.c:312 > #3 0x00007ffff689700d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > Thread 5 (Thread 0x7fffeffff700 (LWP 23492)): > #0 __lll_lock_wait () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 > #1 0x00007ffff7770672 in _L_lock_953 () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #2 0x00007ffff77704da in __GI___pthread_mutex_lock (mutex=0xf36860) at > ../nptl/pthread_mutex_lock.c:114 > #3 0x00007ffff7b9d445 in bufferevent_setcb (bufev=0xf44030, readcb=0x0, > writecb=0x4046f8 <device_write_cb>, eventcb=0x0, cbarg=0x642d00) > at bufferevent.c:372 > #4 0x000000000040411f in device_chunk_cb (device=0x642d00) at cometa.c:474 > ---Type <return> to continue, or q <return> to quit--- > #5 0x000000000040454d in device_read_cb (bev=0x642730, arg=0x642d00) at > cometa.c:612 > #6 0x00007ffff7b9dd1c in bufferevent_run_deferred_callbacks_locked > (cb=<optimized out>, arg=0x642730) at bufferevent.c:150 > #7 0x00007ffff7ba77ff in event_process_active_single_queue > (base=base@entry=0x7fffe4000900, > activeq=0x7fffe4000d50, > max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0) > at event.c:1626 > #8 0x00007ffff7ba82cf in event_process_active (base=0x7fffe4000900) at > event.c:1689 > #9 event_base_loop (base=0x7fffe4000900, flags=0) at event.c:1912 > #10 0x000000000040a00e in device_loop (arg=0x0) at cometa.c:1450 > #11 0x00007ffff776e182 in start_thread (arg=0x7fffeffff700) at > pthread_create.c:312 > #12 0x00007ffff689700d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > Thread 4 (Thread 0x7ffff4b2e700 (LWP 23491)): > #0 0x00007ffff68976a3 in epoll_wait () at > ../sysdeps/unix/syscall-template.S:81 > #1 0x00007ffff7bb19a8 in epoll_dispatch (base=0x7fffe0000900, > tv=<optimized out>) at epoll.c:463 > #2 0x00007ffff7ba8106 in event_base_loop (base=0x7fffe0000900, flags=0) at > event.c:1898 > #3 0x000000000040a00e in device_loop (arg=0x0) at cometa.c:1450 > #4 0x00007ffff776e182 in start_thread (arg=0x7ffff4b2e700) at > pthread_create.c:312 > #5 0x00007ffff689700d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > Thread 3 (Thread 0x7ffff532f700 (LWP 23490)): > #0 0x00007ffff68976a3 in epoll_wait () at > ../sysdeps/unix/syscall-template.S:81 > #1 0x00007ffff7bb19a8 in epoll_dispatch (base=0x7fffe8000900, > tv=<optimized out>) at epoll.c:463 > #2 0x00007ffff7ba8106 in event_base_loop (base=0x7fffe8000900, flags=0) at > event.c:1898 > #3 0x000000000040a00e in device_loop (arg=0x0) at cometa.c:1450 > #4 0x00007ffff776e182 in start_thread (arg=0x7ffff532f700) at > pthread_create.c:312 > #5 0x00007ffff689700d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > Thread 2 (Thread 0x7ffff5b30700 (LWP 23489)): > #0 0x00007ffff68976a3 in epoll_wait () at > ../sysdeps/unix/syscall-template.S:81 > ---Type <return> to continue, or q <return> to quit--- > #1 0x00007ffff7bb19a8 in epoll_dispatch (base=0x7ffff0000900, > tv=<optimized out>) at epoll.c:463 > #2 0x00007ffff7ba8106 in event_base_loop (base=0x7ffff0000900, flags=0) at > event.c:1898 > #3 0x000000000040a00e in device_loop (arg=0x0) at cometa.c:1450 > #4 0x00007ffff776e182 in start_thread (arg=0x7ffff5b30700) at > pthread_create.c:312 > #5 0x00007ffff689700d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > Thread 1 (Thread 0x7ffff7fdd780 (LWP 23485)): > #0 __lll_lock_wait () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 > #1 0x00007ffff7770672 in _L_lock_953 () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #2 0x00007ffff77704da in __GI___pthread_mutex_lock (mutex=0x642a50) at > ../nptl/pthread_mutex_lock.c:114 > #3 0x00007ffff7b9686e in evbuffer_get_length (buffer=0x642930) at > buffer.c:610 > #4 0x0000000000404782 in device_write_cb (bev=0xf44030, arg=0x642d00) at > cometa.c:677 > #5 0x00007ffff7b9dd46 in bufferevent_run_deferred_callbacks_locked > (cb=<optimized out>, arg=0xf44030) at bufferevent.c:154 > #6 0x00007ffff7ba77ff in event_process_active_single_queue > (base=base@entry=0x6218d0, > activeq=0x621d20, > max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0) > at event.c:1626 > #7 0x00007ffff7ba82cf in event_process_active (base=0x6218d0) at > event.c:1689 > #8 event_base_loop (base=0x6218d0, flags=0) at event.c:1912 > #9 0x000000000040a50f in main () at cometa.c:1584 > > Thanks, > > -Marco G. > > > > > > #3 0x00007ffff7b9686e in evbuffer_get_length (buffer=0x6398d0) at > > > buffer.c:610 > > > #4 0x0000000000404723 in device_write_cb (bev=0x64f350, arg=0x639ca0) at > > > cometa.c:672 > > > #5 0x00007ffff7ba2139 in bufferevent_trigger_nolock_ (options=0, > > iotype=4, > > > bufev=0x64f350) at bufferevent-internal.h:369 > > > #6 bufferevent_writecb (fd=<optimized out>, event=<optimized out>, > > > arg=0x64f350) at bufferevent_sock.c:297 > > > #7 0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized out>, > > > base=0x6218d0) at event.c:1531 > > > #8 event_process_active_single_queue (base=base@entry=0x6218d0, > > > activeq=0x621d20, max_to_process=max_to_process@entry=2147483647, > > > endtime=endtime@entry=0x0) at event.c:1590 > > > #9 0x00007ffff7ba82cf in event_process_active (base=0x6218d0) at > > > event.c:1689 > > > #10 event_base_loop (base=0x6218d0, flags=0) at event.c:1912 > > > #11 0x000000000040a4b0 in main () at cometa.c:1579 > > > (gdb) t 2 > > > [Switching to thread 2 (Thread 0x7ffff5b30700 (LWP 28878))] > > > #0 0x00007ffff68976a3 in epoll_wait () at > > > ../sysdeps/unix/syscall-template.S:81 > > > 81 ../sysdeps/unix/syscall-template.S: No such file or directory. > > > (gdb) bt > > > #0 0x00007ffff68976a3 in epoll_wait () at > > > ../sysdeps/unix/syscall-template.S:81 > > > #1 0x00007ffff7bb19a8 in epoll_dispatch (base=0x7ffff0000900, > > > tv=<optimized out>) at epoll.c:463 > > > #2 0x00007ffff7ba8106 in event_base_loop (base=0x7ffff0000900, flags=0) > > at > > > event.c:1898 > > > #3 0x0000000000409faf in device_loop (arg=0x0) at cometa.c:1445 > > > #4 0x00007ffff776e182 in start_thread (arg=0x7ffff5b30700) at > > > pthread_create.c:312 > > > #5 0x00007ffff689700d in clone () at > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > > (gdb) t 3 > > > [Switching to thread 3 (Thread 0x7ffff532f700 (LWP 28879))] > > > #0 0x00007ffff68976a3 in epoll_wait () at > > > ../sysdeps/unix/syscall-template.S:81 > > > 81 in ../sysdeps/unix/syscall-template.S > > > (gdb) bt > > > #0 0x00007ffff68976a3 in epoll_wait () at > > > ../sysdeps/unix/syscall-template.S:81 > > > #1 0x00007ffff7bb19a8 in epoll_dispatch (base=0x7fffe8000900, > > > tv=<optimized out>) at epoll.c:463 > > > #2 0x00007ffff7ba8106 in event_base_loop (base=0x7fffe8000900, flags=0) > > at > > > event.c:1898 > > > #3 0x0000000000409faf in device_loop (arg=0x0) at cometa.c:1445 > > > #4 0x00007ffff776e182 in start_thread (arg=0x7ffff532f700) at > > > pthread_create.c:312 > > > #5 0x00007ffff689700d in clone () at > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > > (gdb) t 4 > > > [Switching to thread 4 (Thread 0x7ffff4b2e700 (LWP 28880))] > > > #0 0x00007ffff68976a3 in epoll_wait () at > > > ../sysdeps/unix/syscall-template.S:81 > > > 81 in ../sysdeps/unix/syscall-template.S > > > (gdb) bt > > > #0 0x00007ffff68976a3 in epoll_wait () at > > > ../sysdeps/unix/syscall-template.S:81 > > > #1 0x00007ffff7bb19a8 in epoll_dispatch (base=0x7fffc0000900, > > > tv=<optimized out>) at epoll.c:463 > > > #2 0x00007ffff7ba8106 in event_base_loop (base=0x7fffc0000900, flags=0) > > at > > > event.c:1898 > > > #3 0x0000000000409faf in device_loop (arg=0x0) at cometa.c:1445 > > > #4 0x00007ffff776e182 in start_thread (arg=0x7ffff4b2e700) at > > > pthread_create.c:312 > > > #5 0x00007ffff689700d in clone () at > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > > (gdb) t 5 > > > [Switching to thread 5 (Thread 0x7fffeffff700 (LWP 28881))] > > > #0 __lll_lock_wait () at > > > ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 > > > 135 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file > > or > > > directory. > > > (gdb) bt > > > #0 __lll_lock_wait () at > > > ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 > > > #1 0x00007ffff7770672 in _L_lock_953 () from > > > /lib/x86_64-linux-gnu/libpthread.so.0 > > > #2 0x00007ffff77704da in __GI___pthread_mutex_lock (mutex=0x64b320) at > > > ../nptl/pthread_mutex_lock.c:114 > > > #3 0x00007ffff7b9d445 in bufferevent_setcb (bufev=0x64f350, > > > readcb=0x405022 <evhttp_read_cb>, writecb=0x4051ce <evhttp_write_cb>, > > > eventcb=0x405129 <evhttp_event_cb>, cbarg=0x64e690) at > > bufferevent.c:372 > > > #4 0x0000000000405ecb in send_http_response (connection=0x64e690, > > > response_buffer=0x7fffbc0008c0) at cometa.c:977 > > > > What this function do? > > But let's start from valgrind report first. > > > > > #5 0x00000000004040ef in device_chunk_cb (device=0x639ca0) at > > cometa.c:471 > > > > Hang on.. > > > > Do you lock this device->pending->bufev somehow (or in some inner frame)? > > If so then *deadlock is possible*, iff in thread#1 evbuffer_get_length() > > called with this bufev output, and this is the only issue that can > > happens here, if I understand you snippets+backtraces correctly.
You read the paragraph above, do you? Anyway you new backtrace don't have answer for this question. Cheers, Azat. > > > > > #6 0x00000000004044ee in device_read_cb (bev=0x6396d0, arg=0x639ca0) at > > > cometa.c:607 > > > #7 0x00007ffff7ba2324 in bufferevent_trigger_nolock_ (options=0, > > iotype=2, > > > bufev=0x6396d0) at bufferevent-internal.h:366 > > > #8 bufferevent_readcb (fd=35, event=<optimized out>, arg=0x6396d0) at > > > bufferevent_sock.c:187 > > > #9 0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized out>, > > > base=0x7fffbc000900) at event.c:1531 > > > #10 event_process_active_single_queue (base=base@entry=0x7fffbc000900, > > > activeq=0x7fffbc000d50, > > > max_to_process=max_to_process@entry=2147483647, > > endtime=endtime@entry=0x0) > > > at event.c:1590 > > > #11 0x00007ffff7ba82cf in event_process_active (base=0x7fffbc000900) at > > > event.c:1689 > > > #12 event_base_loop (base=0x7fffbc000900, flags=0) at event.c:1912 > > > #13 0x0000000000409faf in device_loop (arg=0x0) at cometa.c:1445 > > > #14 0x00007ffff776e182 in start_thread (arg=0x7fffeffff700) at > > > pthread_create.c:312 > > > #15 0x00007ffff689700d in clone () at > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > > > (gdb) thread apply all bt > > > > > > > > Adding the BEV_OPT_DEFER_CALLBACKS seems to improve considerably and I > > > haven't got a deadlock yet. > > > > > > At some point I would have to put some locks back in for synchronization > > > though. I have occasional buffer corruption possibly from interleaved > > read > > > callbacks from different threads. > > > > > > Thanks, > > > > > > -Marco G. > > > > > > > > > > > > > Hi Azat, > > > > > > > > The other 2 threads are idle: > > > > > > > > #0 0x00007ffff68976a3 in epoll_wait () at > > > > ../sysdeps/unix/syscall-template.S:81 > > > > #1 0x00007ffff7bb19a8 in epoll_dispatch (base=0x6218d0, tv=<optimized > > > > out>) at epoll.c:463 > > > > #2 0x00007ffff7ba8106 in event_base_loop (base=0x6218d0, flags=0) at > > > > event.c:1898 > > > > #3 0x000000000040a763 in main () at cometa.c:1583 > > > > > > > > > > > >> > > > > >> > Definitely the write callback was called tens of thousand of times > > > >> before > > > >> > incurring in the deadlock, and I suspect luck or chance has little > > to do > > > >> > with the deadlock. It may have more to do with the fact that the > > read > > > >> > callback receiving a message send it down a bufferevent with the > > write > > > >> > callback handled by another thread? > > > >> > > > > > > > > Any implications having the read callback receiving a message than is > > then > > > > sent to a bufferevent and the write callback for that message being on > > a > > > > different thread? > > > > I can't imagine any problems with this use case. > > > > Even with your internal locks, the *first* lock that must be acquired -- > > bufferevent lock, and it will block if some of threads already holding > > it (there were some issues with freeing bufferevents, but they was > > before finalizers, hence in 2.1 upstream they mustn't exist anymore). > > > > Cheers, > > Azat. > > > > > > > > > > > > > >> > > > > >> > The fact is that after adding BEV_OPT_UNLOCK_CALLBACKS the deadlock > > has > > > >> not > > > >> > appeared again. Before it took about one hour of stress test to > > appear. > > > >> If > > > >> > removing the callback lock solved the problem, please help me > > understand > > > >> > what was happening in the first place and if for instance the added > > lock > > > >> > inside the callback was unnecessary or even the cause of the > > deadlock > > > >> > itself. > > > > > > > > > > > > After additional testing, I confirm that the deadlock has not appeared > > > > when setting BEV_OPT_UNLOCK_CALLBACKS. > > > > > > > > > > > > >> > Also, since enabling OpenSSL is coming next, I was wondering if you > > > >> have a > > > >> > word of wisdom or two, on how to better tackle the task at hands or > > > >> which > > > >> > version of libevent is best suited for the job. I am using the head > > of > > > >> the > > > >> > tree, or to do some debugging if that it was it takes, I would be > > okay > > > >> with > > > >> > it. I am afraid that locking the callbacks may be needed with > > OpenSSL > > > >> > enabled. > > > >> > > > >> I prefer to use the latest versions, to be up-to-date. > > > >> bufferevent_openssl doesn't require bufferevent locking - look at > > > >> bufferevent_openssl_new_impl() in bufferevent_openssl.c > > > >> > > > > > > > > Hence using BEV_OPT_UNLOCK_CALLBACKS seems the way to go anyway. > > > > > > > > Thanks, > > > > > > > > -Marco G. > > > > > > > > > > > > >> > Thanks for the useful insights. > > > >> > > > > >> > -Marco G. > > > >> > > > >> And for the next time: could you avoid top-posting please. > > > >> > > > > > > > > ps. Sorry for that. Using GMail makes it too easy to forget. > > > > > > > > > > > >> > > > >> Cheers, > > > >> Azat. > > > >> > > > >> > > > > >> > On Wed, Feb 11, 2015 at 3:11 PM, Azat Khuzhin <[email protected]> > > > >> wrote: > > > >> > > > > >> > > On Tue, Feb 10, 2015 at 09:06:40PM -0800, Marco Graziano wrote: > > > >> > > > I am running into a deadlock with both libevent 2.0 and 2.1 > > that I > > > >> cannot > > > >> > > > explain. One thread is executing a write callback and the other > > > >> thread a > > > >> > > > read callback of a different bufferevent. Since they are both > > > >> accessing > > > >> > > the > > > >> > > > same data structure passed to the callbacks (arg=0x630c40), I am > > > >> using a > > > >> > > > rwlock in the data structure. I did not expect though the > > evbuffer > > > >> > > internal > > > >> > > > lock to interfere with the lock used to synchronize access to > > the > > > >> data > > > >> > > > structure shared between the callbacks. > > > >> > > > > > > >> > > > I am including the backtraces and the relevant portions of > > code. I > > > >> would > > > >> > > > appreciate if someone can give me any suggestion on how to work > > > >> around > > > >> > > this. > > > >> > > > > > > >> > > > Thanks, > > > >> > > > > > > >> > > > -Marco G. > > > >> > > > > > > >> > > > (gdb) t 1 > > > >> > > > [Switching to thread 1 (Thread 0x7ffff7fdd780 (LWP 12008))] > > > >> > > > (gdb) bt > > > >> > > > #0 __lll_lock_wait () at > > > >> > > > ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 > > > >> > > > #1 0x00007ffff7770672 in _L_lock_953 () from > > > >> > > > /lib/x86_64-linux-gnu/libpthread.so.0 > > > >> > > > #2 0x00007ffff77704da in __GI___pthread_mutex_lock > > > >> (mutex=0x620200) at > > > >> > > > ../nptl/pthread_mutex_lock.c:114 > > > >> > > > #3 0x00007ffff7b9686e in evbuffer_get_length (buffer=0x630930) > > at > > > >> > > > buffer.c:610 > > > >> > > > > > >> > > I don't sure that this two backtraces could have deadlock. Do you? > > > >> > > Could it call write callback again and again, but you was so lucky > > > >> that > > > >> > > you always see such backtrace? > > > >> > > > > > >> > > Maybe some other thread acquire some locks in other order? > > > >> > > (I bet to this one). > > > >> > > > > > >> > > Also did you check program under memcheck/drd/helgrind? > > > >> > > Again if you was so *lucky* that you overwrite lock of evbuffer by > > > >> your > > > >> > > own, but it is so unlikely, that you could just ignore this. > > > >> > > > > > >> > > And if you could write 50-100L sample to reproduce the issue, it > > will > > > >> be > > > >> > > very helpful. > > > >> > > > > > >> > > Cheers, > > > >> > > Azat. > > > >> > > > > > >> > > > #4 0x000000000040494b in device_write_cb (bev=0x123f640, > > > >> arg=0x630c40) > > > >> > > at > > > >> > > > cometa.c:667 > > > >> > > > #5 0x00007ffff7ba2139 in bufferevent_trigger_nolock_ > > (options=0, > > > >> > > iotype=4, > > > >> > > > bufev=0x123f640) at bufferevent-internal.h:369 > > > >> > > > #6 bufferevent_writecb (fd=<optimized out>, event=<optimized > > out>, > > > >> > > > arg=0x123f640) at bufferevent_sock.c:297 > > > >> > > > #7 0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized > > out>, > > > >> > > > base=0x6218d0) at event.c:1531 > > > >> > > > #8 event_process_active_single_queue (base=base@entry > > =0x6218d0, > > > >> > > > activeq=0x621d20, max_to_process=max_to_process@entry= > > 2147483647, > > > >> > > > endtime=endtime@entry=0x0) at event.c:1590 > > > >> > > > #9 0x00007ffff7ba82cf in event_process_active (base=0x6218d0) > > at > > > >> > > > event.c:1689 > > > >> > > > #10 event_base_loop (base=0x6218d0, flags=0) at event.c:1912 > > > >> > > > #11 0x000000000040a700 in main () at cometa.c:1571 > > > >> > > > > > > >> > > > > > > >> > > > bufferevent 0x123f640 write callback: > > > >> > > > 665 pthread_rwlock_wrlock(&device->lock); > > > >> > > > 666 if ((device->pending && device->state == WAITING_STATE) > > || > > > >> > > > device->ws_pending) { > > > >> > > > 667 int l = evbuffer_get_length(device_bev); > > > >> > > > >> How device_dev is installed? > > > >> (evbuffer_new()/evbuffer_enable_locking()) > > > >> > > > >> > > > 668 pthread_rwlock_unlock(&device->lock); > > > >> > > > 669 output = bufferevent_get_output(device->pending->bufev); > > > >> > > > 670 if (evbuffer_get_length(output) == 0) { > > > >> > > > 671 /* free and close the connection */ > > > >> > > > 672 pthread_rwlock_wrlock(&device->lock); > > > >> > > > 673 free_http_connection(device->pending); > > > >> > > > 674 device->pending = NULL; > > > >> > > > 675 device->state = IDLE_STATE; > > > >> > > > 676 device->chunk_length = 0; > > > >> > > > 677 device->ntoread = -1; > > > >> > > > 678 > > > >> > > > 679 evbuffer_drain (device_bev, l); > > > >> > > > 680 sem_post(&device->active); > > > >> > > > 681 pthread_rwlock_unlock(&device->lock); > > > >> > > > 682 } > > > >> > > > 683 } else { > > > >> > > > 684 pthread_rwlock_unlock(&device->lock); > > > >> > > > 685 } > > > >> > > > > > > >> > > > > > > >> > > > (gdb) t 3 > > > >> > > > [Switching to thread 3 (Thread 0x7ffff532f700 (LWP 12013))] > > > >> > > > (gdb) bt > > > >> > > > #0 pthread_rwlock_wrlock () at > > > >> > > > > > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:85 > > > >> > > > #1 0x0000000000404667 in device_read_cb (bev=0x630730, > > > >> arg=0x630c40) at > > > >> > > > cometa.c:604 > > > >> > > > #2 0x00007ffff7ba2324 in bufferevent_trigger_nolock_ > > (options=0, > > > >> > > iotype=2, > > > >> > > > bufev=0x630730) at bufferevent-internal.h:366 > > > >> > > > #3 bufferevent_readcb (fd=34, event=<optimized out>, > > arg=0x630730) > > > >> at > > > >> > > > bufferevent_sock.c:187 > > > >> > > > #4 0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized > > out>, > > > >> > > > base=0x7fffe8000900) at event.c:1531 > > > >> > > > #5 event_process_active_single_queue (base=base@entry > > > >> =0x7fffe8000900, > > > >> > > > activeq=0x7fffe8000d50, > > > >> > > > max_to_process=max_to_process@entry=2147483647, > > > >> > > endtime=endtime@entry=0x0) > > > >> > > > at event.c:1590 > > > >> > > > #6 0x00007ffff7ba82cf in event_process_active > > > >> (base=0x7fffe8000900) at > > > >> > > > event.c:1689 > > > >> > > > #7 event_base_loop (base=0x7fffe8000900, flags=0) at > > event.c:1912 > > > >> > > > #8 0x000000000040a1ff in device_loop (arg=0x0) at cometa.c:1437 > > > >> > > > #9 0x00007ffff776e182 in start_thread (arg=0x7ffff532f700) at > > > >> > > > pthread_create.c:312 > > > >> > > > #10 0x00007ffff689700d in clone () at > > > >> > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > > >> > > > > > > >> > > > bufferevent 0x630730 read call back: > > > >> > > > 604 pthread_rwlock_wrlock(&device->lock); > > > >> > > > 605 if (device->state != WAITING_STATE) { > > > >> > > > 606 > > > >> > > > 607 device->chunk_length = 0; > > > >> > > > 608 device->ntoread = -1; > > > >> > > > 609 /* in case of state == WAITING_STATE the counters reset > > is > > > >> done > > > >> > > in > > > >> > > > the write callback after sending the response */ > > > >> > > > 610 } > > > >> > > > 611 pthread_rwlock_unlock(&device->lock); > > > >> > > *********************************************************************** > > > >> To unsubscribe, send an e-mail to [email protected] with > > > >> unsubscribe libevent-users in the body. > > > >> > > > > > > > > > > *********************************************************************** > > To unsubscribe, send an e-mail to [email protected] with > > unsubscribe libevent-users in the body. > > -- Respectfully Azat Khuzhin *********************************************************************** To unsubscribe, send an e-mail to [email protected] with unsubscribe libevent-users in the body.
