Jean-Pascal Briquet created ARTEMIS-5140:
--------------------------------------------
Summary: Poisonous message in $.artemis.internal message causes
high resource usage on target redistribution node in cluster
Key: ARTEMIS-5140
URL: https://issues.apache.org/jira/browse/ARTEMIS-5140
Project: ActiveMQ Artemis
Issue Type: Bug
Components: Broker, Clustering
Reporter: Jean-Pascal Briquet
Attachments: message-redistribution-failing-in-loop.log,
messages-accumulated-in-notif-queues.png, notif-queue-created-in-loop.log,
notif-queues-growing.png
*Configuration:*
A cluster of three nodes A,B,C with message redistribution enabled.
*Description:*
When the cluster connectivity is started, each Artemis node creates a
$.artemis.internal queue for each other nodes in the cluster.
Message pending redistribution are moved in these queues by Artemis.
On node C, if a poisonous (non-forwardable) message is added or moved to a
"$.artemis.internal" queue, it leads to:
* the cluster connection bridge attempts to process the message
* the bridge fails at the beforeForward step, as message lacks essential
properties for the message redistribution (no queue IDs), resulting in an
exception
* cluster connection and consumers are immediately closed
* one second later, the cluster connection and consumers are re-created, which
triggers the creation of a "notif.*" queue on node B
This sequence happens in loop and causes continuous high CPU and disk usage on
node B, as the "activemq.notification" address keeps accumulating messages in
"notif.*" queues.
A potential protection mechanism could be implemented to move poisonous
messages back to their original queue (if identifiable in message properties)
Or, if this is not possible, the invalid message could be moved to a
dead-letter queue.
*Note:*
Originally, the problem was initially seen when an operator moved a message
stuck in a "duplicated" internal queue into the standard internal queue to
start its redistribution.
Screenshots and related logs are provided in attachment.
*Reproduction:*
To reproduce, simply put or move a message into a $.artemis.internal queue.
This triggers the reconnection loop almost instantly on the node where the
message was injected.
Resource usage on the nodeId targeted by the $.artemis.internal queue rapidly
increase as more and more "notif.*" queues are being created.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact