Hassan Raza created AMQCPP-646:
----------------------------------

             Summary: Segmentation fault in TransportFilter & 
FailoverTransportListener
                 Key: AMQCPP-646
                 URL: https://issues.apache.org/jira/browse/AMQCPP-646
             Project: ActiveMQ C++ Client
          Issue Type: Bug
          Components: Transports
    Affects Versions: 3.9.4, 3.9.3
         Environment: We're using Red Hat Enterprise 7.6
            Reporter: Hassan Raza
            Assignee: Timothy Bish
         Attachments: code.patch

We're getting the following segmentation faults on a productive system with 160 
cores on a fairly regular basis:
{quote}-|/export/..../lib/libactivemq-cpp.so.19

{{-|[7] : 
activemq::transport::failover::FailoverTransportListener::onCommand(decaf::lang::Pointer<activemq::commands::Command,
 decaf::util::concurrent::atomic::AtomicRefCounter>)+0x175}}
{{ -|[8] : 
activemq::transport::TransportFilter::onCommand(decaf::lang::Pointer<activemq::commands::Command,
 decaf::util::concurrent::atomic::AtomicRefCounter>)+0xc2}}
{{ -|[9] : 
activemq::wireformat::openwire::OpenWireFormatNegotiator::onCommand(decaf::lang::Pointer<activemq::commands::Command,
 decaf::util::concurrent::atomic::AtomicRefCounter>)+0xe6}}
{{ -|[10] : 
activemq::transport::TransportFilter::onCommand(decaf::lang::Pointer<activemq::commands::Command,
 decaf::util::concurrent::atomic::AtomicRefCounter>)+0xc2}}
{{ -|[11] : 
activemq::transport::inactivity::InactivityMonitor::onCommand(decaf::lang::Pointer<activemq::commands::Command,
 decaf::util::concurrent::atomic::AtomicRefCounter>)+0x292}}
{{ -|[12] : 
activemq::transport::TransportFilter::onCommand(decaf::lang::Pointer<activemq::commands::Command,
 decaf::util::concurrent::atomic::AtomicRefCounter>)+0xc2}}
{{ -|[13] : 
activemq::transport::IOTransport::(fire(decaf::lang::Pointer<activemq::commands::Command,
 decaf::util::concurrent::atomic::AtomicRefCounter>)+0x9c}}
{{ -|[14] : activemq::transport::IOTransport::run()+0xb1}}
{quote}
We've narrowed the problem down to the call to 
Transport::setTransportListener(NULL) in TransportFilter::close() - it seems to 
be the same race condition reported with:

https://jira.apache.org/jira/browse/AMQCPP-534

and

https://jira.apache.org/jira/browse/AMQCPP-583

 I'm guessing the fix will also be similar - a basic patch suggestion is 
attached. You may want to put the static DefaultTransportListener in a common 
place.

However, what is more worrying is that there appear to be several race 
conditions related to FailoverTransport::getTransportListener(). It provides 
mutex based access to the current listener, but all subsequent access to the 
raw pointer is unprotected. We had a concrete case in 
FailoverTransportListener::onCommand(). There are two calls to 
parent->getTransportListener() - one for the null ptr check, the second to 
invoke the command. In our seg fault, we observed that the first call returned 
a valid ptr, but the second call didn't (or returned null). Our explanation is 
that since getting and setting the TransportListener is protected by a mutex, 
another thread was able to get a hold of the mutex between the two calls and 
set it to FailoverTransport::setTransportListener(NULL).

Perhaps a much better way of handling this situation would be to use shared_ptr 
instead of a raw pointer for:

{{FailoverTransportImpl::TransportListener* 
transportListener;}}{{std::shared_ptr<TransportListener> 
FailoverTransport::getTransportListener()}}{{FailoverTransport::setTransportListener(std::shared_ptr<TransportListener>
 listener)}}

This way, a thread could set a new listener, and any threads still referencing 
the old listener would gracefully fail. We looked into providing a patch for 
that, but realized that would be a much bigger change involving 
ActiveMQConnection deriving from std::enable_shared_from_this, which would also 
mean replacing decaf::lang::Pointer by std::shared_ptr... Without knowing why 
decaf::lang::Pointer exists in the first place, it would be difficult to 
proceed without additional clarifications.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to