Christian Danner created ARTEMIS-3264:
-----------------------------------------
Summary: Core to AMQP conversion error causes client disconnect
Key: ARTEMIS-3264
URL: https://issues.apache.org/jira/browse/ARTEMIS-3264
Project: ActiveMQ Artemis
Issue Type: Bug
Components: AMQP, Broker
Affects Versions: 2.17.0
Environment: Embedded Apache Artemis 2.17.0
Windows Server 2016 Standard (10.0.14393)
Java(TM) SE Runtime Environment (build 1.8.0_152-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode)
Reporter: Christian Danner
Attachments: activemq_artemis.log
We are deploying a mesh of embedded brokers and per default use core bridges to
replicate data between different broker instances / topics.
The clients that actually consume messages are connected using AMQP (QPID, AMQP
.Net Lite)
Recently we encountered a situation where the broker could not deliver a
message to a (Java QPID) client because the internal conversion from Core to
AMQP failed (see attached log file).
This had the effect that the client got disconnected and did not receive any
messages anymore at all (it was stuck in a JMS receive call and obviously was
not informed about disconnect - not sure if this is a QPID/Proton issue, but
even after restart the client was not able to connect anymore to the server! We
had to restart the server to be able to connect again!)
We are currently working around this issue by using AMQP (i.e. JMS) as the only
client side protocol to avoid that Core-AMQP conversion happens in the first
place.
However, I'm wondering if the way the broker deals with such errors is a good
idea - it disconnects the client and keeps the message in the queue, so even
after reconnect the delivery fails again with the same Exception!
Looking at the call stack (ending up in QueueImpl:3800) this kind of error is
handled in a very generic way - the handler method does not distinguish between
different types of Exceptions and knows nothing about the reason why delivery
failed, however it still defaults to disconnecting the corresponding client.
I think in the situation described above it would be necessary to forward the
erroneous message to a DLQ instead and continue with the next message.
Currently the message clogs the queue and needs to be deleted / moved manually
in order for processing to continue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)