[jira] [Updated] (AMQCPP-685) Recurring Segfault bubbles up through ActiveMQProducer::send(); possibly related to FailoverTransport

Chris A. Evans (Jira) Thu, 09 Sep 2021 08:36:34 -0700


     [ 
https://issues.apache.org/jira/browse/AMQCPP-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chris A. Evans updated AMQCPP-685:
----------------------------------
    Description: 
We have encountered a regularly occurring issue when using the activemq-cpp 
3.9.4. Our application is acting as a producer to a message queue. We connect 
to an ActiveMQ 5.16.0 broker server using the failover transport.

All other potential configurations have not been exonerated of this issue 
because, while it occurs regularly, we have yet to determine the pattern to be 
able to reliably reproduce it.

We end up segfaulting because we're only catching CMSExecption, and the 
exception comes from decaf. We considered broadening our exception handling, 
but are aren't sure what the impact would be on future produced messages if we 
just caught this exception – and since we have yet to be able to reliably 
re-produce this on-demand, we haven't been able to test that behavior. So for 
now, we consider a crash the better outcome than potentially bad messages.

I am a bit out of my element here, so let me know if any additional context or 
supporting material is needed here to assist.

The following internal code of ours is invoking 
{{activemq::core::ActiveMQProducer::send():}}
{code:cpp}
bool AMQueue::send(Message* message) {
    try {
        if(_producer) {
            _producer->send(message);
            return true;
        }
    } catch (CMSException& e) {
        log4cxx::LoggerPtr logger = log4cxx::Logger::getLogger("ActiveMQ");
        LOG4CXX_ERROR(logger, e.getMessage());
    }
    return false;
}
{code}
The following stack trace is always present in the core dump: 
{code:java}
#7  0xf7267679 in decaf::lang::Exception::buildMessage(char const*, char*&) () 
from /usr/local/lib/libactivemq-cpp.so.19
#8  0xf72c21c5 in 
decaf::util::NoSuchElementException::NoSuchElementException(char const*, int, 
char const*, ...) () from /usr/local/lib/libactivemq-cpp.so.19
#9  0xf70d2512 in decaf::util::HashMap<unsigned int, 
decaf::lang::Pointer<activemq::transport::FutureResponse, 
decaf::util::concurrent::atomic::AtomicRefCounter>, 
decaf::util::HashCode<unsigned int> >::remove(unsigned int const&) () from 
/usr/local/lib/libactivemq-cpp.so.19
#10 0xf70cfbc0 in (anonymous 
namespace)::ResponseFinalizer::~ResponseFinalizer() () from 
/usr/local/lib/libactivemq-cpp.so.19
#11 0xf70d0d69 in 
activemq::transport::correlator::ResponseCorrelator::request(decaf::lang::Pointer<activemq::commands::Command,
 decaf::util::concurrent::atomic::AtomicRefCounter>) () from 
/usr/local/lib/libactivemq-cpp.so.19
#12 0xf6eb374e in 
activemq::core::ActiveMQConnection::syncRequest(decaf::lang::Pointer<activemq::commands::Command,
 decaf::util::concurrent::atomic::AtomicRefCounter>, unsigned int) () from 
/usr/local/lib/libactivemq-cpp.so.19
#13 0xf6eb3c8c in 
activemq::core::ActiveMQConnection::asyncRequest(decaf::lang::Pointer<activemq::commands::Command,
 decaf::util::concurrent::atomic::AtomicRefCounter>, cms::AsyncCallback*) () 
from /usr/local/lib/libactivemq-cpp.so.19
#14 0xf6ff2d7b in 
activemq::core::kernels::ActiveMQSessionKernel::send(activemq::core::kernels::ActiveMQProducerKernel*,
 decaf::lang::Pointer<activemq::commands::ActiveMQDestination, 
decaf::util::concurrent::atomic::AtomicRefCounter>, cms::Message*, int, int, 
long long, activemq::util::MemoryUsage*, long long, cms::AsyncCallback*) () 
from /usr/local/lib/libactivemq-cpp.so.19
#15 0xf6fd733d in 
activemq::core::kernels::ActiveMQProducerKernel::send(cms::Destination const*, 
cms::Message*, int, int, long long, cms::AsyncCallback*) () from 
/usr/local/lib/libactivemq-cpp.so.19
#16 0xf6fcfe54 in 
activemq::core::kernels::ActiveMQProducerKernel::send(cms::Message*) () from 
/usr/local/lib/libactivemq-cpp.so.19
#17 0xf6f3c186 in activemq::core::ActiveMQProducer::send(cms::Message*) () from 
/usr/local/lib/libactivemq-cpp.so.19
{code}
Our only theory at this time is that this may eventually occur after producing 
a very high number of messages, such as 8+ million produced messages. This type 
of crash has never happened early in the lifetime of the application run.

The title of this issue suggests a possible relation to the FailoverTransport. 
I came to this conclusion because we haven't had this issue during the entire 
lifetime of our app – it is a recent-ish phenomenon, which loosely matches up 
with us switching from the tcp:// URI syntax to failover://.

I also noticed that the HashMap in the {{ResponseFinalizer}} object is placed 
there from a {{&this->impl->requestMap}} call in 
{{ResponseCorrelator::request}}. A quick search of the repo leads to to notice 
that requestMap is only present in FailoverTransport.cpp (unless I missed it 
elsewhere, which is possible).

The NoSuchElementException exception is thrown once the {{ResponseFinalizer}} 
deconstructor tries to call {{map->remove(commandId);.}}

  was:
We have encountered a regularly occurring issue when using the activemq-cpp 
3.9.4. Our application is acting as a producer to a message queue. We connect 
to an ActiveMQ 5.16.0 broker server using the failover transport.

All other potential configurations have not been exonerated of this issue 
because, while it occurs regularly, we have yet to determine the pattern to be 
able to reliably reproduce it. We segfault because we're only catching 
CMSExecption. We 

I am a bit out of my element here, so let me know if any additional context or 
supporting material is needed here to assist.

The following internal code of ours is invoking 
{{activemq::core::ActiveMQProducer::send():}}
{code:cpp}
bool AMQueue::send(Message* message) {
    try {
        if(_producer) {
            _producer->send(message);
            return true;
        }
    } catch (CMSException& e) {
        log4cxx::LoggerPtr logger = log4cxx::Logger::getLogger("ActiveMQ");
        LOG4CXX_ERROR(logger, e.getMessage());
    }
    return false;
}
{code}
The following stack trace is always present in the core dump: 
{code:java}
#7  0xf7267679 in decaf::lang::Exception::buildMessage(char const*, char*&) () 
from /usr/local/lib/libactivemq-cpp.so.19
#8  0xf72c21c5 in 
decaf::util::NoSuchElementException::NoSuchElementException(char const*, int, 
char const*, ...) () from /usr/local/lib/libactivemq-cpp.so.19
#9  0xf70d2512 in decaf::util::HashMap<unsigned int, 
decaf::lang::Pointer<activemq::transport::FutureResponse, 
decaf::util::concurrent::atomic::AtomicRefCounter>, 
decaf::util::HashCode<unsigned int> >::remove(unsigned int const&) () from 
/usr/local/lib/libactivemq-cpp.so.19
#10 0xf70cfbc0 in (anonymous 
namespace)::ResponseFinalizer::~ResponseFinalizer() () from 
/usr/local/lib/libactivemq-cpp.so.19
#11 0xf70d0d69 in 
activemq::transport::correlator::ResponseCorrelator::request(decaf::lang::Pointer<activemq::commands::Command,
 decaf::util::concurrent::atomic::AtomicRefCounter>) () from 
/usr/local/lib/libactivemq-cpp.so.19
#12 0xf6eb374e in 
activemq::core::ActiveMQConnection::syncRequest(decaf::lang::Pointer<activemq::commands::Command,
 decaf::util::concurrent::atomic::AtomicRefCounter>, unsigned int) () from 
/usr/local/lib/libactivemq-cpp.so.19
#13 0xf6eb3c8c in 
activemq::core::ActiveMQConnection::asyncRequest(decaf::lang::Pointer<activemq::commands::Command,
 decaf::util::concurrent::atomic::AtomicRefCounter>, cms::AsyncCallback*) () 
from /usr/local/lib/libactivemq-cpp.so.19
#14 0xf6ff2d7b in 
activemq::core::kernels::ActiveMQSessionKernel::send(activemq::core::kernels::ActiveMQProducerKernel*,
 decaf::lang::Pointer<activemq::commands::ActiveMQDestination, 
decaf::util::concurrent::atomic::AtomicRefCounter>, cms::Message*, int, int, 
long long, activemq::util::MemoryUsage*, long long, cms::AsyncCallback*) () 
from /usr/local/lib/libactivemq-cpp.so.19
#15 0xf6fd733d in 
activemq::core::kernels::ActiveMQProducerKernel::send(cms::Destination const*, 
cms::Message*, int, int, long long, cms::AsyncCallback*) () from 
/usr/local/lib/libactivemq-cpp.so.19
#16 0xf6fcfe54 in 
activemq::core::kernels::ActiveMQProducerKernel::send(cms::Message*) () from 
/usr/local/lib/libactivemq-cpp.so.19
#17 0xf6f3c186 in activemq::core::ActiveMQProducer::send(cms::Message*) () from 
/usr/local/lib/libactivemq-cpp.so.19
{code}
Our only theory at this time is that this may eventually occur after producing 
a very high number of messages, such as 8+ million produced messages. This type 
of crash has never happened early in the lifetime of the application run.

The title of this issue suggests a possible relation to the FailoverTransport. 
I came to this conclusion because we haven't had this issue during the entire 
lifetime of our app – it is a recent-ish phenomenon, which loosely matches up 
with us switching from the tcp:// URI syntax to failover://.

I also noticed that the HashMap in the {{ResponseFinalizer}} object is placed 
there from a {{&this->impl->requestMap}} call in 
{{ResponseCorrelator::request}}. A quick search of the repo leads to to notice 
that requestMap is only present in FailoverTransport.cpp (unless I missed it 
elsewhere, which is possible).

The NoSuchElementException exception is thrown once the {{ResponseFinalizer}} 
deconstructor tries to call {{map->remove(commandId);.}}


> Recurring Segfault bubbles up through ActiveMQProducer::send(); possibly 
> related to FailoverTransport
> -----------------------------------------------------------------------------------------------------
>
>                 Key: AMQCPP-685
>                 URL: https://issues.apache.org/jira/browse/AMQCPP-685
>             Project: ActiveMQ C++ Client
>          Issue Type: Bug
>          Components: CMS Impl
>    Affects Versions: 3.9.4
>         Environment: * Virtual Machine on VMware ESXi 6.7
>  * RHEL 7.9
>  * Kernel 3.10.0-1160.36.2.el7.x86_64
>  * ActiveMQ-CPP 3.9.4 manually compiled 32-bit from source
>            Reporter: Chris A. Evans
>            Assignee: Timothy A. Bish
>            Priority: Major
>              Labels: RedHat
>
> We have encountered a regularly occurring issue when using the activemq-cpp 
> 3.9.4. Our application is acting as a producer to a message queue. We connect 
> to an ActiveMQ 5.16.0 broker server using the failover transport.
> All other potential configurations have not been exonerated of this issue 
> because, while it occurs regularly, we have yet to determine the pattern to 
> be able to reliably reproduce it.
> We end up segfaulting because we're only catching CMSExecption, and the 
> exception comes from decaf. We considered broadening our exception handling, 
> but are aren't sure what the impact would be on future produced messages if 
> we just caught this exception – and since we have yet to be able to reliably 
> re-produce this on-demand, we haven't been able to test that behavior. So for 
> now, we consider a crash the better outcome than potentially bad messages.
> I am a bit out of my element here, so let me know if any additional context 
> or supporting material is needed here to assist.
> The following internal code of ours is invoking 
> {{activemq::core::ActiveMQProducer::send():}}
> {code:cpp}
> bool AMQueue::send(Message* message) {
>     try {
>         if(_producer) {
>             _producer->send(message);
>             return true;
>         }
>     } catch (CMSException& e) {
>         log4cxx::LoggerPtr logger = log4cxx::Logger::getLogger("ActiveMQ");
>         LOG4CXX_ERROR(logger, e.getMessage());
>     }
>     return false;
> }
> {code}
> The following stack trace is always present in the core dump: 
> {code:java}
> #7  0xf7267679 in decaf::lang::Exception::buildMessage(char const*, char*&) 
> () from /usr/local/lib/libactivemq-cpp.so.19
> #8  0xf72c21c5 in 
> decaf::util::NoSuchElementException::NoSuchElementException(char const*, int, 
> char const*, ...) () from /usr/local/lib/libactivemq-cpp.so.19
> #9  0xf70d2512 in decaf::util::HashMap<unsigned int, 
> decaf::lang::Pointer<activemq::transport::FutureResponse, 
> decaf::util::concurrent::atomic::AtomicRefCounter>, 
> decaf::util::HashCode<unsigned int> >::remove(unsigned int const&) () from 
> /usr/local/lib/libactivemq-cpp.so.19
> #10 0xf70cfbc0 in (anonymous 
> namespace)::ResponseFinalizer::~ResponseFinalizer() () from 
> /usr/local/lib/libactivemq-cpp.so.19
> #11 0xf70d0d69 in 
> activemq::transport::correlator::ResponseCorrelator::request(decaf::lang::Pointer<activemq::commands::Command,
>  decaf::util::concurrent::atomic::AtomicRefCounter>) () from 
> /usr/local/lib/libactivemq-cpp.so.19
> #12 0xf6eb374e in 
> activemq::core::ActiveMQConnection::syncRequest(decaf::lang::Pointer<activemq::commands::Command,
>  decaf::util::concurrent::atomic::AtomicRefCounter>, unsigned int) () from 
> /usr/local/lib/libactivemq-cpp.so.19
> #13 0xf6eb3c8c in 
> activemq::core::ActiveMQConnection::asyncRequest(decaf::lang::Pointer<activemq::commands::Command,
>  decaf::util::concurrent::atomic::AtomicRefCounter>, cms::AsyncCallback*) () 
> from /usr/local/lib/libactivemq-cpp.so.19
> #14 0xf6ff2d7b in 
> activemq::core::kernels::ActiveMQSessionKernel::send(activemq::core::kernels::ActiveMQProducerKernel*,
>  decaf::lang::Pointer<activemq::commands::ActiveMQDestination, 
> decaf::util::concurrent::atomic::AtomicRefCounter>, cms::Message*, int, int, 
> long long, activemq::util::MemoryUsage*, long long, cms::AsyncCallback*) () 
> from /usr/local/lib/libactivemq-cpp.so.19
> #15 0xf6fd733d in 
> activemq::core::kernels::ActiveMQProducerKernel::send(cms::Destination 
> const*, cms::Message*, int, int, long long, cms::AsyncCallback*) () from 
> /usr/local/lib/libactivemq-cpp.so.19
> #16 0xf6fcfe54 in 
> activemq::core::kernels::ActiveMQProducerKernel::send(cms::Message*) () from 
> /usr/local/lib/libactivemq-cpp.so.19
> #17 0xf6f3c186 in activemq::core::ActiveMQProducer::send(cms::Message*) () 
> from /usr/local/lib/libactivemq-cpp.so.19
> {code}
> Our only theory at this time is that this may eventually occur after 
> producing a very high number of messages, such as 8+ million produced 
> messages. This type of crash has never happened early in the lifetime of the 
> application run.
> The title of this issue suggests a possible relation to the 
> FailoverTransport. I came to this conclusion because we haven't had this 
> issue during the entire lifetime of our app – it is a recent-ish phenomenon, 
> which loosely matches up with us switching from the tcp:// URI syntax to 
> failover://.
> I also noticed that the HashMap in the {{ResponseFinalizer}} object is placed 
> there from a {{&this->impl->requestMap}} call in 
> {{ResponseCorrelator::request}}. A quick search of the repo leads to to 
> notice that requestMap is only present in FailoverTransport.cpp (unless I 
> missed it elsewhere, which is possible).
> The NoSuchElementException exception is thrown once the {{ResponseFinalizer}} 
> deconstructor tries to call {{map->remove(commandId);.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (AMQCPP-685) Recurring Segfault bubbles up through ActiveMQProducer::send(); possibly related to FailoverTransport

Reply via email to