On Wed, Feb 18, 2015 at 8:43 AM, Marco Graziano <[email protected]>
wrote:

> On Tue, Feb 17, 2015 at 4:32 AM, Azat Khuzhin <[email protected]> wrote:
>
>> On Thu, Feb 12, 2015 at 09:16:20AM -0800, Marco Graziano wrote:
>> > Hi Azat,
>>
>> Hi Marco,
>>
>> >
>> > The backtraces I included are from the only threads waiting while
>> holding a
>> > lock. The other threads are providing convincing signs of good health in
>> > the trace log.
>>
>> Could you please post backstraces for all available threads?
>> (I hope you don't have 100 of them).
>
>
Hi Azat,

As an experiment, I removed all the additional locks and set only the flags
 BEV_OPT_CLOSE_ON_FREE |  BEV_OPT_THREADSAFE

I get very quickly into a deadlock with the following backtraces:

(gdb) bt
#0  __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007ffff7770672 in _L_lock_953 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007ffff77704da in __GI___pthread_mutex_lock (mutex=0x6399f0) at
../nptl/pthread_mutex_lock.c:114
#3  0x00007ffff7b9686e in evbuffer_get_length (buffer=0x6398d0) at
buffer.c:610
#4  0x0000000000404723 in device_write_cb (bev=0x64f350, arg=0x639ca0) at
cometa.c:672
#5  0x00007ffff7ba2139 in bufferevent_trigger_nolock_ (options=0, iotype=4,
bufev=0x64f350) at bufferevent-internal.h:369
#6  bufferevent_writecb (fd=<optimized out>, event=<optimized out>,
arg=0x64f350) at bufferevent_sock.c:297
#7  0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized out>,
base=0x6218d0) at event.c:1531
#8  event_process_active_single_queue (base=base@entry=0x6218d0,
activeq=0x621d20, max_to_process=max_to_process@entry=2147483647,
    endtime=endtime@entry=0x0) at event.c:1590
#9  0x00007ffff7ba82cf in event_process_active (base=0x6218d0) at
event.c:1689
#10 event_base_loop (base=0x6218d0, flags=0) at event.c:1912
#11 0x000000000040a4b0 in main () at cometa.c:1579
(gdb) t 2
[Switching to thread 2 (Thread 0x7ffff5b30700 (LWP 28878))]
#0  0x00007ffff68976a3 in epoll_wait () at
../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x00007ffff68976a3 in epoll_wait () at
../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff7bb19a8 in epoll_dispatch (base=0x7ffff0000900,
tv=<optimized out>) at epoll.c:463
#2  0x00007ffff7ba8106 in event_base_loop (base=0x7ffff0000900, flags=0) at
event.c:1898
#3  0x0000000000409faf in device_loop (arg=0x0) at cometa.c:1445
#4  0x00007ffff776e182 in start_thread (arg=0x7ffff5b30700) at
pthread_create.c:312
#5  0x00007ffff689700d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) t 3
[Switching to thread 3 (Thread 0x7ffff532f700 (LWP 28879))]
#0  0x00007ffff68976a3 in epoll_wait () at
../sysdeps/unix/syscall-template.S:81
81 in ../sysdeps/unix/syscall-template.S
(gdb) bt
#0  0x00007ffff68976a3 in epoll_wait () at
../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff7bb19a8 in epoll_dispatch (base=0x7fffe8000900,
tv=<optimized out>) at epoll.c:463
#2  0x00007ffff7ba8106 in event_base_loop (base=0x7fffe8000900, flags=0) at
event.c:1898
#3  0x0000000000409faf in device_loop (arg=0x0) at cometa.c:1445
#4  0x00007ffff776e182 in start_thread (arg=0x7ffff532f700) at
pthread_create.c:312
#5  0x00007ffff689700d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) t 4
[Switching to thread 4 (Thread 0x7ffff4b2e700 (LWP 28880))]
#0  0x00007ffff68976a3 in epoll_wait () at
../sysdeps/unix/syscall-template.S:81
81 in ../sysdeps/unix/syscall-template.S
(gdb) bt
#0  0x00007ffff68976a3 in epoll_wait () at
../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff7bb19a8 in epoll_dispatch (base=0x7fffc0000900,
tv=<optimized out>) at epoll.c:463
#2  0x00007ffff7ba8106 in event_base_loop (base=0x7fffc0000900, flags=0) at
event.c:1898
#3  0x0000000000409faf in device_loop (arg=0x0) at cometa.c:1445
#4  0x00007ffff776e182 in start_thread (arg=0x7ffff4b2e700) at
pthread_create.c:312
#5  0x00007ffff689700d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) t 5
[Switching to thread 5 (Thread 0x7fffeffff700 (LWP 28881))]
#0  __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or
directory.
(gdb) bt
#0  __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007ffff7770672 in _L_lock_953 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007ffff77704da in __GI___pthread_mutex_lock (mutex=0x64b320) at
../nptl/pthread_mutex_lock.c:114
#3  0x00007ffff7b9d445 in bufferevent_setcb (bufev=0x64f350,
readcb=0x405022 <evhttp_read_cb>, writecb=0x4051ce <evhttp_write_cb>,
    eventcb=0x405129 <evhttp_event_cb>, cbarg=0x64e690) at bufferevent.c:372
#4  0x0000000000405ecb in send_http_response (connection=0x64e690,
response_buffer=0x7fffbc0008c0) at cometa.c:977
#5  0x00000000004040ef in device_chunk_cb (device=0x639ca0) at cometa.c:471
#6  0x00000000004044ee in device_read_cb (bev=0x6396d0, arg=0x639ca0) at
cometa.c:607
#7  0x00007ffff7ba2324 in bufferevent_trigger_nolock_ (options=0, iotype=2,
bufev=0x6396d0) at bufferevent-internal.h:366
#8  bufferevent_readcb (fd=35, event=<optimized out>, arg=0x6396d0) at
bufferevent_sock.c:187
#9  0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized out>,
base=0x7fffbc000900) at event.c:1531
#10 event_process_active_single_queue (base=base@entry=0x7fffbc000900,
activeq=0x7fffbc000d50,
    max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0)
at event.c:1590
#11 0x00007ffff7ba82cf in event_process_active (base=0x7fffbc000900) at
event.c:1689
#12 event_base_loop (base=0x7fffbc000900, flags=0) at event.c:1912
#13 0x0000000000409faf in device_loop (arg=0x0) at cometa.c:1445
#14 0x00007ffff776e182 in start_thread (arg=0x7fffeffff700) at
pthread_create.c:312
#15 0x00007ffff689700d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Adding the BEV_OPT_DEFER_CALLBACKS seems to improve considerably and I
haven't got a deadlock yet.

At some point I would have to put some locks back in for synchronization
though. I have occasional buffer corruption possibly from interleaved read
callbacks from different threads.

Thanks,

-Marco G.



> Hi Azat,
>
> The other 2 threads are idle:
>
> #0  0x00007ffff68976a3 in epoll_wait () at
> ../sysdeps/unix/syscall-template.S:81
> #1  0x00007ffff7bb19a8 in epoll_dispatch (base=0x6218d0, tv=<optimized
> out>) at epoll.c:463
> #2  0x00007ffff7ba8106 in event_base_loop (base=0x6218d0, flags=0) at
> event.c:1898
> #3  0x000000000040a763 in main () at cometa.c:1583
>
>
>> >
>> > Definitely the write callback was called tens of thousand of times
>> before
>> > incurring in the deadlock, and I suspect luck or chance has little to do
>> > with the deadlock. It may have more to do with the fact that the read
>> > callback receiving a message send it down a bufferevent with the write
>> > callback handled by another thread?
>>
>
> Any implications having the read callback receiving a message than is then
> sent to a bufferevent and the write callback for that message being on a
> different thread?
>
>
>> >
>> > The fact is that after adding BEV_OPT_UNLOCK_CALLBACKS the deadlock has
>> not
>> > appeared again. Before it took about one hour of stress test to appear.
>> If
>> > removing the callback lock solved the problem, please help me understand
>> > what was happening in the first place and if for instance the added lock
>> > inside the callback was unnecessary or even the cause of the deadlock
>> > itself.
>
>
> After additional testing, I confirm that the deadlock has not appeared
> when setting BEV_OPT_UNLOCK_CALLBACKS.
>
> >
>> > Also, since enabling OpenSSL is coming next, I was wondering if you
>> have a
>> > word of wisdom or two, on how to better tackle the task at hands or
>> which
>> > version of libevent is best suited for the job. I am using the head of
>> the
>> > tree, or to do some debugging if that it was it takes, I would be okay
>> with
>> > it. I am afraid that locking the callbacks may be needed with OpenSSL
>> > enabled.
>>
>> I prefer to use the latest versions, to be up-to-date.
>> bufferevent_openssl doesn't require bufferevent locking - look at
>> bufferevent_openssl_new_impl() in bufferevent_openssl.c
>>
>
> Hence using  BEV_OPT_UNLOCK_CALLBACKS seems the way to go anyway.
>
> Thanks,
>
> -Marco G.
>
> >
>> > Thanks for the useful insights.
>> >
>> > -Marco G.
>>
>> And for the next time: could you avoid top-posting please.
>>
>
> ps. Sorry for that. Using GMail makes it too easy to forget.
>
>
>>
>> Cheers,
>> Azat.
>>
>> >
>> > On Wed, Feb 11, 2015 at 3:11 PM, Azat Khuzhin <[email protected]>
>> wrote:
>> >
>> > > On Tue, Feb 10, 2015 at 09:06:40PM -0800, Marco Graziano wrote:
>> > > > I am running into a deadlock with both libevent 2.0 and 2.1 that I
>> cannot
>> > > > explain. One thread is executing a write callback and the other
>> thread a
>> > > > read callback of a different bufferevent. Since they are both
>> accessing
>> > > the
>> > > > same data structure passed to the callbacks (arg=0x630c40), I am
>> using a
>> > > > rwlock in the data structure. I did not expect though the evbuffer
>> > > internal
>> > > > lock to interfere with the lock used to synchronize access to the
>> data
>> > > > structure shared between the callbacks.
>> > > >
>> > > > I am including the backtraces and the relevant portions of code. I
>> would
>> > > > appreciate if someone can give me any suggestion on how to work
>> around
>> > > this.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > -Marco G.
>> > > >
>> > > > (gdb) t 1
>> > > > [Switching to thread 1 (Thread 0x7ffff7fdd780 (LWP 12008))]
>> > > > (gdb) bt
>> > > > #0  __lll_lock_wait () at
>> > > > ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>> > > > #1  0x00007ffff7770672 in _L_lock_953 () from
>> > > > /lib/x86_64-linux-gnu/libpthread.so.0
>> > > > #2  0x00007ffff77704da in __GI___pthread_mutex_lock
>> (mutex=0x620200) at
>> > > > ../nptl/pthread_mutex_lock.c:114
>> > > > #3  0x00007ffff7b9686e in evbuffer_get_length (buffer=0x630930) at
>> > > > buffer.c:610
>> > >
>> > > I don't sure that this two backtraces could have deadlock. Do you?
>> > > Could it call write callback again and again, but you was so lucky
>> that
>> > > you always see such backtrace?
>> > >
>> > > Maybe some other thread acquire some locks in other order?
>> > > (I bet to this one).
>> > >
>> > > Also did you check program under memcheck/drd/helgrind?
>> > > Again if you was so *lucky* that you overwrite lock of evbuffer by
>> your
>> > > own, but it is so unlikely, that you could just ignore this.
>> > >
>> > > And if you could write 50-100L sample to reproduce the issue, it will
>> be
>> > > very helpful.
>> > >
>> > > Cheers,
>> > > Azat.
>> > >
>> > > > #4  0x000000000040494b in device_write_cb (bev=0x123f640,
>> arg=0x630c40)
>> > > at
>> > > > cometa.c:667
>> > > > #5  0x00007ffff7ba2139 in bufferevent_trigger_nolock_ (options=0,
>> > > iotype=4,
>> > > > bufev=0x123f640) at bufferevent-internal.h:369
>> > > > #6  bufferevent_writecb (fd=<optimized out>, event=<optimized out>,
>> > > > arg=0x123f640) at bufferevent_sock.c:297
>> > > > #7  0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized out>,
>> > > > base=0x6218d0) at event.c:1531
>> > > > #8  event_process_active_single_queue (base=base@entry=0x6218d0,
>> > > > activeq=0x621d20, max_to_process=max_to_process@entry=2147483647,
>> > > >     endtime=endtime@entry=0x0) at event.c:1590
>> > > > #9  0x00007ffff7ba82cf in event_process_active (base=0x6218d0) at
>> > > > event.c:1689
>> > > > #10 event_base_loop (base=0x6218d0, flags=0) at event.c:1912
>> > > > #11 0x000000000040a700 in main () at cometa.c:1571
>> > > >
>> > > >
>> > > > bufferevent 0x123f640 write callback:
>> > > > 665    pthread_rwlock_wrlock(&device->lock);
>> > > > 666    if ((device->pending && device->state == WAITING_STATE) ||
>> > > > device->ws_pending) {
>> > > > 667     int l = evbuffer_get_length(device_bev);
>>
>> How device_dev is installed?
>> (evbuffer_new()/evbuffer_enable_locking())
>>
>> > > > 668     pthread_rwlock_unlock(&device->lock);
>> > > > 669     output = bufferevent_get_output(device->pending->bufev);
>> > > > 670 if (evbuffer_get_length(output) == 0) {
>> > > > 671 /* free and close the connection */
>> > > > 672 pthread_rwlock_wrlock(&device->lock);
>> > > > 673     free_http_connection(device->pending);
>> > > > 674     device->pending = NULL;
>> > > > 675     device->state = IDLE_STATE;
>> > > > 676   device->chunk_length = 0;
>> > > > 677     device->ntoread = -1;
>> > > > 678
>> > > > 679     evbuffer_drain (device_bev, l);
>> > > > 680   sem_post(&device->active);
>> > > > 681   pthread_rwlock_unlock(&device->lock);
>> > > > 682     }
>> > > > 683    } else {
>> > > > 684     pthread_rwlock_unlock(&device->lock);
>> > > > 685    }
>> > > >
>> > > >
>> > > > (gdb) t 3
>> > > > [Switching to thread 3 (Thread 0x7ffff532f700 (LWP 12013))]
>> > > > (gdb) bt
>> > > > #0  pthread_rwlock_wrlock () at
>> > > > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:85
>> > > > #1  0x0000000000404667 in device_read_cb (bev=0x630730,
>> arg=0x630c40) at
>> > > > cometa.c:604
>> > > > #2  0x00007ffff7ba2324 in bufferevent_trigger_nolock_ (options=0,
>> > > iotype=2,
>> > > > bufev=0x630730) at bufferevent-internal.h:366
>> > > > #3  bufferevent_readcb (fd=34, event=<optimized out>, arg=0x630730)
>> at
>> > > > bufferevent_sock.c:187
>> > > > #4  0x00007ffff7ba7a19 in event_persist_closure (ev=<optimized out>,
>> > > > base=0x7fffe8000900) at event.c:1531
>> > > > #5  event_process_active_single_queue (base=base@entry
>> =0x7fffe8000900,
>> > > > activeq=0x7fffe8000d50,
>> > > >     max_to_process=max_to_process@entry=2147483647,
>> > > endtime=endtime@entry=0x0)
>> > > > at event.c:1590
>> > > > #6  0x00007ffff7ba82cf in event_process_active
>> (base=0x7fffe8000900) at
>> > > > event.c:1689
>> > > > #7  event_base_loop (base=0x7fffe8000900, flags=0) at event.c:1912
>> > > > #8  0x000000000040a1ff in device_loop (arg=0x0) at cometa.c:1437
>> > > > #9  0x00007ffff776e182 in start_thread (arg=0x7ffff532f700) at
>> > > > pthread_create.c:312
>> > > > #10 0x00007ffff689700d in clone () at
>> > > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>> > > >
>> > > > bufferevent 0x630730 read call back:
>> > > > 604 pthread_rwlock_wrlock(&device->lock);
>> > > > 605    if (device->state != WAITING_STATE) {
>> > > > 606
>> > > > 607   device->chunk_length = 0;
>> > > > 608     device->ntoread = -1;
>> > > > 609     /* in case of state == WAITING_STATE the counters reset is
>> done
>> > > in
>> > > > the write callback after sending the response */
>> > > > 610    }
>> > > > 611 pthread_rwlock_unlock(&device->lock);
>> ***********************************************************************
>> To unsubscribe, send an e-mail to [email protected] with
>> unsubscribe libevent-users    in the body.
>>
>
>

Reply via email to