Howard Gao created ARTEMIS-2854:
-----------------------------------

             Summary: Non-durable subscribers may stop receiving after failover
                 Key: ARTEMIS-2854
                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2854
             Project: ActiveMQ Artemis
          Issue Type: Bug
          Components: Broker
    Affects Versions: 2.14.0
            Reporter: Howard Gao
            Assignee: Howard Gao
             Fix For: 2.15.0


In a cluster scenario where non durable subscribers fail over to backup while 
another live node forwarding messages to it, there is a chance that the the 
live node keeps the old remote binding for the subs and messages go to those
old remote bindings will result in "finding not found".

For example suppose there are 2 live-backup pairs in the cluster: Live1 backup1
Live2 and backup2. A non durable subscriber connects to Live1 and messages
are sent to Live2 and then redistributed to the sub on Live1.

Now Live1 crashes and backup1 becomes live. The subscriber fails over to 
backup1.
In the mean time Live2 re-connects backup1 too. During the process Live2 didn't
successfully remove the old remote binding for the subs and it still point to 
the
old temp queue's id (which is gone with the Live1 as it's a temp queue).
So the messages (after failover) still are routed to the old queue which is no 
longer there. The subscriber will be idle without receiving new messages from 
it.

The code concerned this :

https://github.com/apache/activemq-artemis/blob/master/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/cluster/impl/ClusterConnectionImpl.java#L1239

The code doesn't take care of the case where it's possible that the old remote 
binding is still in the map the it's key (clusterName) will be the same as the 
new remote binding (which references to a new temp queue) recreated on fail 
over.







--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to