Ilkka Virolainen created ARTEMIS-1864:
-----------------------------------------
Summary: On-Demand Message Redistribution Can Spontaneously Start
Failing in Single Direction
Key: ARTEMIS-1864
URL: https://issues.apache.org/jira/browse/ARTEMIS-1864
Project: ActiveMQ Artemis
Issue Type: Bug
Components: Broker
Affects Versions: 2.5.0
Environment: RHEL 6.2
Reporter: Ilkka Virolainen
It's possible that the message redistribution of an Artemis cluster can
spontaneously fail after running a while. I've witnessed this several times
using a two node colocated replicating cluster with a basic configuration:
{code:java}
<cluster-connections>
<cluster-connection name="my-cluster">
<connector-ref>netty-connector</connector-ref>
<retry-interval>500</retry-interval>
<reconnect-attempts>5</reconnect-attempts>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<discovery-group-ref discovery-group-name="my-discovery-group"/>
</cluster-connection>
</cluster-connections>{code}
After running a while (approx. two weeks) one of the nodes (node a) will stop
consuming messages from the other node's (node b) internal store-and-forward
queue. This will result in message redistribution not working from node b ->
node a but will work from node a -> node b. The cause for this is unknown:
nothing of note is logged for either broker and JMX shows that the cluster
topology and the broker cluster bridge connection are intact. This will cause
significant problems, mainly:
1. Client communication will only work as expected if the clients happen to
connect to the right brokers
2. Unconsumed messages will end up piling in the internal store-and-forward
queue and consume unnecessary resources. It's also possible (but not verified)
that when messages in the internal queue expire, they leak memory.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)