Cliff Jansen commented on PROTON-1999:

Jeremy: I did not fully understand the scenario you have laid out but I think 
you have answered your own question:

   "Both threads are manipulating reference counts of objects, and I suspect a 
race condition. "

90% of the threading information in cpp/docs/mt.md is aimed at preventing this 
cross thread reference counting that never ends well.

What trips most people up is that connection::~connection() is just as thread 
unsafe as proton::open_session().  Usually this happens because some shared 
object passed between threads has an embedded proton::connection object.  Calls 
to connection::foo() are carefully scrutinized for obeying "the rules", but the 
destructor is often left to the whims of a "shared pointer last reference", 
which only gets the last dereference in the correct thread 50% of the time.

But it can also happen if a user object with an embedded Proton object is 
copied in the wrong thread, or the Proton object is passed by value (copy!) 
instead of by reference to a method in the wrong thread.

In your specific case, I wonder if activities (destructors?) that are happening 
in a non-container thread need to be passed to a relevant work_queue instead, 
before the container::stop() is invoked.  That's mostly a guess at this point 
and may be at best a "treat the symptom" suggestion as opposed to good design 

> [c] Crash in pn_connection_finalize
> -----------------------------------
>                 Key: PROTON-1999
>                 URL: https://issues.apache.org/jira/browse/PROTON-1999
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: cpp-binding, proton-c
>    Affects Versions: proton-c-0.26.0
>         Environment: Linux 64-bits (Ubuntu 16.04 and Oracle Linux 7.4)
>            Reporter: Olivier Delbeke
>            Assignee: Cliff Jansen
>            Priority: Major
>         Attachments: call_stack.txt, example2.cpp, log.txt, main.cpp, 
> run_qpid-broker.sh
> Here is my situation : I have several proton::containers (~20). 
> Each one has its own proton::messaging_handler, and handles one 
> proton::connection to a local qpid-broker (everything runs on the same Linux 
> machine).
> 20 x ( one container with one handler with one connection with one link)
> Some containers/connections/handlers work in send mode ; they have one link 
> that is a proton::sender.
> Some containers/connections/handlers work in receive mode ; they have one 
> link that is a proton::receiver. Each time they receive an input message, 
> they do some processing on it, and finally add a "sender->send()" task to the 
> work queue of some sender handlers ( by calling work_queue()->add( [=] \{ 
> sender->send(msg); } as shown in the multi-threading examples).
> This works fine for some time (tens of thousands of messages, several minutes 
> or hours), but eventually crashes, either with a SEGFAULT (when the 
> qpid-proton lib is compiled in release mode) or with an assert (in debug 
> mode), in qpid-proton/c/src/core/engine.c line 483, 
> assert(!conn->transport->referenced) in function pn_connection_finalize().
> The proton logs (activated with export PN_TRACE_FRM=1) do not show anything 
> abnormal (no loss of connection, no rejection of messages, no timeouts, ...).
> As the connection is not closed, I wonder why pn_connection_finalize() would 
> be called in the first place.
> I joined the logs and the call trace.
> Happens on 0.26.0 but also reproduced with the latest master (Jan 28, 2019).

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to