[ 
https://issues.apache.org/jira/browse/QPID-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16227319#comment-16227319
 ] 

Chris Richardson commented on QPID-7991:
----------------------------------------

I have a theory about where the null shared_ptr may be coming from. Looking at 
these lines of code (from src/qpid/broker/Link.cpp:462):

        Bridges::iterator removed = std::remove_if(
            active.begin(), active.end(), boost::bind(&Bridge::isDetached, _1));
        for (Bridges::iterator i = removed; i != active.end(); ++i) {
            Bridge::shared_ptr bridge = *i;

is it possible that the iterator holds only a pointer to the shared_ptr in the 
"active" vector (which would not increment the shared_ptr ref count) and that 
when it's removed from the vector the ref count may hit zero before the 
iterator is referenced and assigned to the "bridge" variable?

Note that the invocation of the isDetached predicate is successfully executed 
on the Bridge instance that subsequently transpired to be null, so presumably 
it was not null at that time... 


> Segfault in broker while processing active bridges
> --------------------------------------------------
>
>                 Key: QPID-7991
>                 URL: https://issues.apache.org/jira/browse/QPID-7991
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Broker
>    Affects Versions: qpid-cpp-1.36.0, qpid-cpp-1.37.0
>         Environment: Ubuntu 17.10 x86_64, gcc 7.
>            Reporter: Chris Richardson
>            Priority: Critical
>             Fix For: qpid-cpp-1.37.0
>
>         Attachments: segfault stack trace, segfault-fix.patch, 
> segfault-repoduce.tar.gz
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Segfault occurs on a brackground thread within about 5-10 seconds of broker 
> startup at src/qpid/broker/Link.cpp:465. [^segfault stack trace] attached, 
> frames #3 and #5 are of particular relevance.
> The unchecked Bridge::shared_ptr derived from the iterator is null and the 
> invocation of bridge->closed() triggers the segfault. Adding a simple null 
> check (as per attached [^segfault-fix.patch]) fixes the segfault but not the 
> underlying reason for the null pointer. 
> The segfault appears to be related to how a second broker (henceforth 
> "broker1") is configured; this is the one to which the links are established. 
> Without broker1, the "segfaulting broker" (aka "broker2") does not do its 
> thing. It may be that broker1 returns invalid data to broker2 but this is not 
> in the scope of this bug report, which focuses on the segfault. 
> h2. Reproduce
> Unfortunately the steps to  arrive at this situation are not clear so the 
> reproduce is a bit hacky - the data directory, config file and some certs for 
> the two brokers are attached as a tarball in the hope that they can be 
> arranged in such a way as to provide a reproduce in lieu of a purely 
> step-based procedure.
> Steps to reproduce:
> * Temporarily add a DNS alias to the local machine of "octopussy" (necessary 
> due to cert config and durable link config in broker2's data store)
> * Extract the attached [^segfault-repoduce.tar.gz] to an empty directory 
> (assumed to be cwd)
> * Start broker1 with "qpidd --config broker1/qpidd.conf"
> * In another shell with the same cwd, start broker2 with "qpidd --config 
> broker2/qpidd.conf"
> * Observe segfault in broker2 after 5-10 seconds.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to