[
https://issues.apache.org/jira/browse/AMQNET-409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rob Waite updated AMQNET-409:
-----------------------------
Description:
We were testing failover under heavy load. We have a queue that I placed 500k
messages in. I then have 5 consumers pulling them as fast as they can. The
operations are noops essentially... so messages are consumed in our case at
around 500 messages per second with our prod servers.
I would bring up the consumers and they would be consuming away. I would then
"kill -9" AMQ and then observe the client. From the AMQ web admin... they would
come back but you would see the 5 consumers go back and forth between 5 and 0
and comsumption went down from 500/s to about 3/s.
Looking at AMQ logs... there was a series of errors like this " Async error
occurred: javax.jms.JMSException: Unmatched acknowledge: MessageAck" and then
there would be a couple variations of exceptions regarding the connection being
lost (EOFException, broken pipe, etc). There would be an endless loop creating
this pattern.. a message about an unmatched ack... a message about a socket
dying... then reconnection from the client... on and on.
I can provide the full stack traces but I have already found the line that
seems to cause the client to reconnect, fail, disconnect ad infinitum.
In our particular case... there is a line in the NMS.ActiveMQ source in
Connection.cs line 950 (in Apache.NMS.ActiveMQ-1.5.6-src) that looks like:
{code:title=Connection.cs|borderStyle=solid}
Tracer.DebugFormat("Connection[{0}]: Async exception with no exception
listener: " + error, this.ConnectionId);
{code}
The error variable was populated with:
"Apache.NMS.NMSConnectionException: Unmatched acknowledge: MessageAck
{commandId = 1146, responseRequired = false, ackType = 2, consumerId =
ID:kcccccc-54512-634956893773509011-1:10:3:1, firstMessageId =
ID:qaaaaa-54879-634956774946428300-1:0:1:2:60645, lastMessageId =
ID:qaaaaaa-ops01-54879-634956774946428300-1:0:1:2:60645, destination =
queue://AAA.RwwTest.Failover, transactionId = null, messageCount = 1,
poisonCause = null}; Could not find Message-ID
ID:qaaaaaa-ops01-54879-634956774946428300-1:0:1:2:60645 in dispatched-list
(start of ack)"
Notice that there are curly braces in there... when you try to concat this with
the first parameter in the Tracer.DebugFormat call you get a
System.FormatException with the message "Input string was not in a correct
format."
Changing the line to the following seemed to fix the particular case we were
seeing.. but I imagine other use cases may cause the same:
{code:title=Connection.cs|borderStyle=solid}
Tracer.DebugFormat("Connection[{0}]: Async exception with no exception
listener: {1}", this.ConnectionId, error);
{code}
It seems odd that String.Format style logging is used but a string is still
concatenated. This seems to happen many places in the code. I would think that
either the Tracer should handle this exception or trim illegal characters or
DebugFormat should not use concatenation for the first param.
was:
We were testing failover under heavy load. We have a queue that I placed 500k
messages in. I then have 5 consumers pulling them as fast as they can. The
operations are noops essentially... so messages are consumed in our case at
around 500 messages per second with our prod servers.
I would bring up the consumers and they would be consuming away. I would then
"kill -9" AMQ and then observe the client. From the AMQ web admin... they would
come back but you would see the 5 consumers go back and forth between 5 and 0
and comsumption went down from 500/s to about 3/s.
Looking at AMQ logs... there was a series of errors like this " Async error
occurred: javax.jms.JMSException: Unmatched acknowledge: MessageAck" and then
there would be a couple variations of exceptions regarding the connection being
lost (EOFException, broken pipe, etc). There would be an endless loop creating
this pattern.. a message about an unmatched ack... a message about a socket
dying... then reconnection from the client... on and on.
I can provide the full stack traces but I have already found the line that
seems to cause the client to reconnect, fail, disconnect ad infinitum.
In our particular case... there is a line in the NMS.ActiveMQ source in
Connection.cs line 950 (in Apache.NMS.ActiveMQ-1.5.6-src) that looks like:
{code:title=Connection.cs|borderStyle=solid}
Tracer.DebugFormat("Connection[{0}]: Async exception with no exception
listener: " + error, this.ConnectionId);
{code}
The error variable was populated with:
"Apache.NMS.NMSConnectionException: Unmatched acknowledge: MessageAck
{commandId = 1146, responseRequired = false, ackType = 2, consumerId =
ID:kcccccc-54512-634956893773509011-1:10:3:1, firstMessageId =
ID:qaaaaa-54879-634956774946428300-1:0:1:2:60645, lastMessageId =
ID:qaaaaaa-ops01-54879-634956774946428300-1:0:1:2:60645, destination =
queue://AAA.RwwTest.Failover, transactionId = null, messageCount = 1,
poisonCause = null}; Could not find Message-ID
ID:qaaaaaa-ops01-54879-634956774946428300-1:0:1:2:60645 in dispatched-list
(start of ack)"
Notice that there are curly braces in there... when you try to concat this with
the first parameter in the Tracer.DebugFormat call you get a
System.FormatException with the message "Input string was not in a correct
format."
It seems odd that String.Format style logging is used but a string is still
concatenated. This seems to happen many places in the code. I would think that
either the Tracer should handle this exception or trim illegal characters or
DebugFormat should not use concatenation for the first param.
> Failover fails under load. Client goes into reconnect loop.
> -----------------------------------------------------------
>
> Key: AMQNET-409
> URL: https://issues.apache.org/jira/browse/AMQNET-409
> Project: ActiveMQ .Net
> Issue Type: Bug
> Components: ActiveMQ
> Affects Versions: 1.5.6
> Environment: Should not matter. AMQ is on Linux, Client is on Windows
> with .NET 4
> Reporter: Rob Waite
> Assignee: Jim Gomes
>
> We were testing failover under heavy load. We have a queue that I placed 500k
> messages in. I then have 5 consumers pulling them as fast as they can. The
> operations are noops essentially... so messages are consumed in our case at
> around 500 messages per second with our prod servers.
> I would bring up the consumers and they would be consuming away. I would then
> "kill -9" AMQ and then observe the client. From the AMQ web admin... they
> would come back but you would see the 5 consumers go back and forth between 5
> and 0 and comsumption went down from 500/s to about 3/s.
> Looking at AMQ logs... there was a series of errors like this " Async error
> occurred: javax.jms.JMSException: Unmatched acknowledge: MessageAck" and then
> there would be a couple variations of exceptions regarding the connection
> being lost (EOFException, broken pipe, etc). There would be an endless loop
> creating this pattern.. a message about an unmatched ack... a message about a
> socket dying... then reconnection from the client... on and on.
> I can provide the full stack traces but I have already found the line that
> seems to cause the client to reconnect, fail, disconnect ad infinitum.
> In our particular case... there is a line in the NMS.ActiveMQ source in
> Connection.cs line 950 (in Apache.NMS.ActiveMQ-1.5.6-src) that looks like:
> {code:title=Connection.cs|borderStyle=solid}
> Tracer.DebugFormat("Connection[{0}]: Async exception with no exception
> listener: " + error, this.ConnectionId);
> {code}
> The error variable was populated with:
> "Apache.NMS.NMSConnectionException: Unmatched acknowledge: MessageAck
> {commandId = 1146, responseRequired = false, ackType = 2, consumerId =
> ID:kcccccc-54512-634956893773509011-1:10:3:1, firstMessageId =
> ID:qaaaaa-54879-634956774946428300-1:0:1:2:60645, lastMessageId =
> ID:qaaaaaa-ops01-54879-634956774946428300-1:0:1:2:60645, destination =
> queue://AAA.RwwTest.Failover, transactionId = null, messageCount = 1,
> poisonCause = null}; Could not find Message-ID
> ID:qaaaaaa-ops01-54879-634956774946428300-1:0:1:2:60645 in dispatched-list
> (start of ack)"
> Notice that there are curly braces in there... when you try to concat this
> with the first parameter in the Tracer.DebugFormat call you get a
> System.FormatException with the message "Input string was not in a correct
> format."
> Changing the line to the following seemed to fix the particular case we were
> seeing.. but I imagine other use cases may cause the same:
> {code:title=Connection.cs|borderStyle=solid}
> Tracer.DebugFormat("Connection[{0}]: Async exception with no exception
> listener: {1}", this.ConnectionId, error);
> {code}
> It seems odd that String.Format style logging is used but a string is still
> concatenated. This seems to happen many places in the code. I would think
> that either the Tracer should handle this exception or trim illegal
> characters or DebugFormat should not use concatenation for the first param.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira