[jira] [Created] (QPID-7051) Crash after reconnect with transactional session (with patch)

JIRA Mon, 08 Feb 2016 06:24:06 -0800

Håkan Johansson created QPID-7051:
-------------------------------------

             Summary: Crash after reconnect with transactional session (with 
patch)
                 Key: QPID-7051
                 URL: https://issues.apache.org/jira/browse/QPID-7051
             Project: Qpid
          Issue Type: Bug
          Components: C++ Client
    Affects Versions: qpid-cpp-0.34
         Environment: Red Hat Enterprise Linux Server release 6.7 (Santiago)


The broker is ActiveMQ 5.13.0.
The protocol used in AMQP 1.0.

            Reporter: Håkan Johansson


I have a test program (see the "consumer.cc" attachment) that creates a 
connection with "reconnect" enabled.
It then creates a transactional session and a receiver to some queue from that 
session.
It then reads all messages from the queue and prints out their content.
A sleep is used between each read to make the test possible.

While the broker is down the program will try to reconnect to it.
As soon as it succeeds with that the fetch call throws an exception because the 
transaction has become invalid.
The exception is caught and the read loop is broken out of.
The test function then exits, causing the _Receiver_, _Session_, and 
_Connection_ objects to be destructed.

The crash happens while destructing the _Connection_ object.

It took some digging, but I managed to find the reason for the crash.
When the _Connection_ object is destructed it automatically destructs its 
_ConnectionHandle_ object, which in turn destructs its _ConnectionContext_ 
object. Nothing strange here.
The _ConnectionContext_ destructor makes a call to its own _close_ method, 
which tries to shut down all its sessions.

The problem is that the session has been made invalid by the disconnect, which 
causes the call to _syncLH_ to throw an exception,
which is not caught anywhere, indirectly causing the _ConnectionContext_ 
destructor to throw an exception. This is a big no-no in C++.

A side effect of this is that the transport object is not closed before it is 
destructed,
which means that it is still listening for events. The crash happens when the 
next pending event tries to use
the destructed transport object.

The solution, in my humble opinion, is to catch the exception throws by the 
_syncLH_ call in the _ConnectionContext::close_ method.
This way we can try to close all sessions even if one or more of them are 
invalidated for some reason.
The rest of the cleanup process will also be done properly.


How to run the test program:
* Compile both "producer.cc" and "consumer.cc". They both need to be linked to 
the "qpidmessaging" library.
* Run "producer" once. This will add ten messages to the "apa.bepa" queue on 
the broker.
* Start "consumer".
* When the consumer starts to print out the messages, shut down and restart the 
broker.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (QPID-7051) Crash after reconnect with transactional session (with patch)

Reply via email to