Katherine Pully created AMQCPP-753:
--------------------------------------

             Summary: Deadlock when connection fails before ack can be delivered
                 Key: AMQCPP-753
                 URL: https://issues.apache.org/jira/browse/AMQCPP-753
             Project: ActiveMQ C++ Client
          Issue Type: Bug
          Components: Decaf
    Affects Versions: 3.9.5
         Environment: Unix/Linux
            Reporter: Katherine Pully
            Assignee: Timothy A. Bish
         Attachments: recreated_hang.txt

When a connection fails before a message acknowledgement can be delivered, only 
decaf exceptions are caught. However, the check for a failed connection 
ultimately results in a CMS exception. As a result, this exception is not 
handled and the consumer read lock does not get released. This will result in a 
deadlock when the session tries to acquire the consumer write lock (for 
example, when cleaning up the connection).

I have attached a stack trace from such a deadlock, which occurs when the 
connection is cleaned up. The relevant portion (edited for brevity and clarity, 
though the attached is the original), is:
{code:java}
0  decaf::util::concurrent::ExecutorKernel::Worker::run() 
ThreadPoolExecutor.cpp:184
1  decaf::util::concurrent::ExecutorKernel::runWorker 
(decaf::util::concurrent::ExecutorKernel::Worker*) ThreadPoolExecutor.cpp:738
2  activemq::core::OnExceptionRunnable::run() ActiveMQConnection.cpp:439
3  activemq::core::ActiveMQConnection::cleanup() ActiveMQConnection.cpp:839
4  activemq::core::kernels::ActiveMQSessionKernel::dispose() 
ActiveMQSessionKernel.cpp:371
5  decaf::util::concurrent::locks::AbstractQueuedSynchronizer::acquire(int) 
AbstractQueuedSynchronizer.cpp:1565
6  decaf::util::concurrent::locks::SynchronizerState::acquireQueued((anonymous 
namespace)::Node*, int) AbstractQueuedSynchronizer.cpp:711
7  decaf::util::concurrent::locks::LockSupport::park() LockSupport.cpp:54
8  decaf::internal::util::concurrent::Threading::park(decaf::lang::Thread*) 
Threading.cpp:1345
9  
decaf::internal::util::concurrent::PlatformThread::interruptibleWaitOnCondition(_opaque_pthread_cond_t*,
 _opaque_pthread_mutex_t*, 
decaf::internal::util::concurrent::CompletionCondition&) PlatformThread.cpp:210
10 _pthread_cond_wait
11 __psynch_cvwait{code}

The issue can be produced by using a client-acknowledge strategy and adding a 
substantial (10+ seconds) call to sleep before acknowledging the message, and 
then breaking the connection.

The exception is originally thrown by 
[ActiveMQConnection::checkClosedOrFailed|[https://github.com/apache/activemq-cpp/blob/master/activemq-cpp/src/main/activemq/core/ActiveMQConnection.cpp#L1329].]
 These exceptions become ActiveMQ exceptions 
[here|[https://github.com/apache/activemq-cpp/blob/master/activemq-cpp/src/main/activemq/core/ActiveMQConnection.cpp#L1257],]
 and then to CMS Exceptions 
[here|[https://github.com/apache/activemq-cpp/blob/master/activemq-cpp/src/main/activemq/core/kernels/ActiveMQConsumerKernel.cpp#L1426].]
 The only exceptions caught in 
[ActiveMQSessionKernel::acknowledge|https://github.com/apache/activemq-cpp/blob/master/activemq-cpp/src/main/activemq/core/kernels/ActiveMQSessionKernel.cpp#L508]
 are [decaf 
exceptions|https://github.com/apache/activemq-cpp/blob/master/activemq-cpp/src/main/activemq/core/kernels/ActiveMQSessionKernel.cpp#L518];
 when a CMS exception is caught, the [consumer read 
lock|https://github.com/apache/activemq-cpp/blob/master/activemq-cpp/src/main/activemq/core/kernels/ActiveMQSessionKernel.cpp#L510]
 is not released.

This issue can be fixed by catching the CMS exceptions.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to