Katherine Pully created AMQCPP-753:
--------------------------------------
Summary: Deadlock when connection fails before ack can be delivered
Key: AMQCPP-753
URL: https://issues.apache.org/jira/browse/AMQCPP-753
Project: ActiveMQ C++ Client
Issue Type: Bug
Components: Decaf
Affects Versions: 3.9.5
Environment: Unix/Linux
Reporter: Katherine Pully
Assignee: Timothy A. Bish
Attachments: recreated_hang.txt
When a connection fails before a message acknowledgement can be delivered, only
decaf exceptions are caught. However, the check for a failed connection
ultimately results in a CMS exception. As a result, this exception is not
handled and the consumer read lock does not get released. This will result in a
deadlock when the session tries to acquire the consumer write lock (for
example, when cleaning up the connection).
I have attached a stack trace from such a deadlock, which occurs when the
connection is cleaned up. The relevant portion (edited for brevity and clarity,
though the attached is the original), is:
{code:java}
0 decaf::util::concurrent::ExecutorKernel::Worker::run()
ThreadPoolExecutor.cpp:184
1 decaf::util::concurrent::ExecutorKernel::runWorker
(decaf::util::concurrent::ExecutorKernel::Worker*) ThreadPoolExecutor.cpp:738
2 activemq::core::OnExceptionRunnable::run() ActiveMQConnection.cpp:439
3 activemq::core::ActiveMQConnection::cleanup() ActiveMQConnection.cpp:839
4 activemq::core::kernels::ActiveMQSessionKernel::dispose()
ActiveMQSessionKernel.cpp:371
5 decaf::util::concurrent::locks::AbstractQueuedSynchronizer::acquire(int)
AbstractQueuedSynchronizer.cpp:1565
6 decaf::util::concurrent::locks::SynchronizerState::acquireQueued((anonymous
namespace)::Node*, int) AbstractQueuedSynchronizer.cpp:711
7 decaf::util::concurrent::locks::LockSupport::park() LockSupport.cpp:54
8 decaf::internal::util::concurrent::Threading::park(decaf::lang::Thread*)
Threading.cpp:1345
9
decaf::internal::util::concurrent::PlatformThread::interruptibleWaitOnCondition(_opaque_pthread_cond_t*,
_opaque_pthread_mutex_t*,
decaf::internal::util::concurrent::CompletionCondition&) PlatformThread.cpp:210
10 _pthread_cond_wait
11 __psynch_cvwait{code}
The issue can be produced by using a client-acknowledge strategy and adding a
substantial (10+ seconds) call to sleep before acknowledging the message, and
then breaking the connection.
The exception is originally thrown by
[ActiveMQConnection::checkClosedOrFailed|[https://github.com/apache/activemq-cpp/blob/master/activemq-cpp/src/main/activemq/core/ActiveMQConnection.cpp#L1329].]
These exceptions become ActiveMQ exceptions
[here|[https://github.com/apache/activemq-cpp/blob/master/activemq-cpp/src/main/activemq/core/ActiveMQConnection.cpp#L1257],]
and then to CMS Exceptions
[here|[https://github.com/apache/activemq-cpp/blob/master/activemq-cpp/src/main/activemq/core/kernels/ActiveMQConsumerKernel.cpp#L1426].]
The only exceptions caught in
[ActiveMQSessionKernel::acknowledge|https://github.com/apache/activemq-cpp/blob/master/activemq-cpp/src/main/activemq/core/kernels/ActiveMQSessionKernel.cpp#L508]
are [decaf
exceptions|https://github.com/apache/activemq-cpp/blob/master/activemq-cpp/src/main/activemq/core/kernels/ActiveMQSessionKernel.cpp#L518];
when a CMS exception is caught, the [consumer read
lock|https://github.com/apache/activemq-cpp/blob/master/activemq-cpp/src/main/activemq/core/kernels/ActiveMQSessionKernel.cpp#L510]
is not released.
This issue can be fixed by catching the CMS exceptions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)