[ 
https://issues.apache.org/jira/browse/PROTON-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467772#comment-16467772
 ] 

Cliff Jansen commented on PROTON-1842:
--------------------------------------

Thank-you for this additional info.

Yes, these look like the same types of stack traces after a dust up.

If they truly are, for me the pconnection_done thread (T2) got the R socket 
event, the last event batch, begin_close and free.

The competing thread (T4) got the RW socket event, then failed in numerous ways 
depending on what was in the torn down or reused freed memory.

Catastrophic stuff happens for about 1 in 10000 connections for a debug build 
on a middle-aged 4c/8t desktop machine.

> [c] Dispatch/Proton crashes when opening/closing connections
> ------------------------------------------------------------
>
>                 Key: PROTON-1842
>                 URL: https://issues.apache.org/jira/browse/PROTON-1842
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: proton-c
>    Affects Versions: proton-c-0.22.0
>            Reporter: Chuck Rolke
>            Priority: Major
>         Attachments: helloworld.cpp, race.tsan, race.vg
>
>
> Using proton cpp example code that is modified to open and close connections 
> by the thousands in the main loop and having the event loop short circuit any 
> messaging with:
> {{  void on_connection_open(proton::connection& c) {}}
> {{      c.close();}}
> {{  }}}
> and then directing this client example to a dispatch router 1.1.0. Eventually 
> (after 100,000 to 1,000,000 connection open/closes) the router crashes with:
> {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:466: 
> wake_pop_front: Assertion `p->wakes_in_progress' failed.}}
> and with:
> {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014: 
> proactor_do_epoll: Assertion `ee->type == PCONNECTION_TIMER' failed.}}
> This issue seems to happen only with qpid-dispatch accepting the open/close 
> event stream. Proton cpp example _server_direct_ and c example _direct_ work 
> properly with the same open/close event stream mounting into the 10s of 
> millions of connections.
> A core dump backtrace with the PCONNECTION_TIMER failure reads as:
> {{(gdb) bt}}
> {{#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51}}
> {{#1  0x00007f795c712c41 in __GI_abort () at abort.c:79}}
> {{#2  0x00007f795c709f7a in __assert_fail_base (fmt=0x7f795c85a260 
> "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
> assertion=assertion@entry=0x7f795d72e15a "ee->type == PCONNECTION_TIMER", }}
> {{    file=file@entry=0x7f795d72de98 
> "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=line@entry=2014, }}
> {{    function=function@entry=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> 
> "proactor_do_epoll") at assert.c:92}}
> {{#3  0x00007f795c709ff2 in __GI___assert_fail (assertion=0x7f795d72e15a 
> "ee->type == PCONNECTION_TIMER", file=0x7f795d72de98 
> "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=2014, }}
> {{    function=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> "proactor_do_epoll") 
> at assert.c:101}}
> {{#4  0x00007f795d72d29f in proactor_do_epoll (p=0x26b7310, can_block=true) 
> at /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014}}
> {{#5  0x00007f795d72d30e in pn_proactor_wait (p=0x26b7310) at 
> /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2030}}
> {{#6  0x00007f795dbe89ad in thread_run (arg=0x26be750) at 
> /home/chug/git/qpid-dispatch/src/server.c:946}}
> {{#7  0x00007f795d50e50b in start_thread (arg=0x7f794f486700) at 
> pthread_create.c:465}}
> {{#8  0x00007f795c7d216f in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to