[
https://issues.apache.org/jira/browse/QPID-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Håkan Johansson updated QPID-8056:
----------------------------------
Description:
When doing HA testing we found that our application often crashed inside the
Qpid Messaging library.
Our test:
* One ActiveMQ broker.
* Two proxies connecting to the AMQP port on the broker. At the start, only one
of the proxies are running.
* Test program configured to use failover between the two proxies. Protocol is
"amqp1.0". It reads messages in a loop using a transactional session. On error
it closes the connection and opens a new.
* Three queues are read from in parallel, each reader using its own connection
in a thread. Nothing is shared between the threads in the client code.
* Send some messages and let the test program process them.
* Stop proxy1 and start proxy2.
* Send some more messages and let the test program process them.
* Stop proxy2 and start proxy1.
* And so on...
After a couple of switches the test program crashes, but not always. It's a
timing thing.
A typical error message that we see before the crash:
{noformat}
Exception when trying to close the qpid connection: Transaction outcome
unknown: transport failure
{noformat}
The reason for the crash is that the poller thread is still active when the
connection is being deleted. The destructor of the
{{qpid::messaging::ConnectionContext}} class deletes the {{TcpTransport}}
instance at the same time as, or right before, the poller thread is calling a
callback on it ({{qpid::messaging::amqp::TcpTransport::disconnected}}).
I have attached a patch to solve the issue, at least for this use case.
I cannot test this on {{1.37.0}} as I cannot build that version on RHEL6 as it
uses Python 2.6 which is no longer supported in {{1.37.0}}. The code in
question is identical in {{1.36.0}} and {{1.37.0}} though.
was:
When doing HA testing we found that our application often crashed inside the
Qpid Messaging library.
Our test:
* One ActiveMQ broker.
* Two proxies connecting to the AMQP port on the broker. At the start, only one
of the proxies are running.
* Test program configured to use failover between the two proxies. Protocol is
"amqp1.0". It reads messages in a loop using a transactional session. On error
it closes the connection and opens a new.
* Send some messages and let the test program process them.
* Stop proxy1 and start proxy2.
* Send some more messages and let the test program process them.
* Stop proxy2 and start proxy1.
* And so on...
After a couple of switches the test program crashes, but not always. It's a
timing thing.
A typical error message that we see before the crash:
{noformat}
Exception when trying to close the qpid connection: Transaction outcome
unknown: transport failure
{noformat}
The reason for the crash is that the poller thread is still active when the
connection is being deleted. The destructor of the
{{qpid::messaging::ConnectionContext}} class deletes the {{TcpTransport}}
instance at the same time as, or right before, the poller thread is calling a
callback on it ({{qpid::messaging::amqp::TcpTransport::disconnected}}).
I have attached a patch to solve the issue, at least for this use case.
I cannot test this on {{1.37.0}} as I cannot build that version on RHEL6 as it
uses Python 2.6 which is no longer supported in {{1.37.0}}. The code in
question is identical in {{1.36.0}} and {{1.37.0}} though.
> qpid::messaging::ConnectionContext crash after network disconnect (with patch)
> ------------------------------------------------------------------------------
>
> Key: QPID-8056
> URL: https://issues.apache.org/jira/browse/QPID-8056
> Project: Qpid
> Issue Type: Bug
> Components: C++ Client
> Affects Versions: qpid-cpp-1.36.0
> Environment: RedHat Enterprise Linux 6
> Reporter: Håkan Johansson
> Labels: crash
> Fix For: qpid-cpp-1.38.0
>
> Attachments: connection_context.diff, valgrind.txt
>
>
> When doing HA testing we found that our application often crashed inside the
> Qpid Messaging library.
> Our test:
> * One ActiveMQ broker.
> * Two proxies connecting to the AMQP port on the broker. At the start, only
> one of the proxies are running.
> * Test program configured to use failover between the two proxies. Protocol
> is "amqp1.0". It reads messages in a loop using a transactional session. On
> error it closes the connection and opens a new.
> * Three queues are read from in parallel, each reader using its own
> connection in a thread. Nothing is shared between the threads in the client
> code.
> * Send some messages and let the test program process them.
> * Stop proxy1 and start proxy2.
> * Send some more messages and let the test program process them.
> * Stop proxy2 and start proxy1.
> * And so on...
> After a couple of switches the test program crashes, but not always. It's a
> timing thing.
> A typical error message that we see before the crash:
> {noformat}
> Exception when trying to close the qpid connection: Transaction outcome
> unknown: transport failure
> {noformat}
> The reason for the crash is that the poller thread is still active when the
> connection is being deleted. The destructor of the
> {{qpid::messaging::ConnectionContext}} class deletes the {{TcpTransport}}
> instance at the same time as, or right before, the poller thread is calling a
> callback on it ({{qpid::messaging::amqp::TcpTransport::disconnected}}).
> I have attached a patch to solve the issue, at least for this use case.
> I cannot test this on {{1.37.0}} as I cannot build that version on RHEL6 as
> it uses Python 2.6 which is no longer supported in {{1.37.0}}. The code in
> question is identical in {{1.36.0}} and {{1.37.0}} though.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]