[
https://issues.apache.org/jira/browse/ARTEMIS-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196996#comment-17196996
]
ASF subversion and git services commented on ARTEMIS-2854:
----------------------------------------------------------
Commit fe5b81fd5564bcc1a6c621cee776285297247964 in activemq-artemis's branch
refs/heads/master from Howard Gao
[ https://gitbox.apache.org/repos/asf?p=activemq-artemis.git;h=fe5b81f ]
ARTEMIS-2854 Non-durable subscribers stop receiving after failover
In a cluster scenario where non durable subscribers fail over to
backup while another live node forwarding messages to it,
there is a chance that the the live node keeps the old remote
binding for the subs and messages go to those
old remote bindings will result in "binding not found".
> Non-durable subscribers may stop receiving after failover
> ---------------------------------------------------------
>
> Key: ARTEMIS-2854
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2854
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Components: Broker
> Affects Versions: 2.14.0
> Reporter: Howard Gao
> Assignee: Howard Gao
> Priority: Major
> Fix For: 2.16.0
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> In a cluster scenario where non durable subscribers fail over to backup while
> another live node forwarding messages to it, there is a chance that the the
> live node keeps the old remote binding for the subs and messages go to those
> old remote bindings will result in "finding not found".
> For example suppose there are 2 live-backup pairs in the cluster: Live1
> backup1
> Live2 and backup2. A non durable subscriber connects to Live1 and messages
> are sent to Live2 and then redistributed to the sub on Live1.
> Now Live1 crashes and backup1 becomes live. The subscriber fails over to
> backup1.
> In the mean time Live2 re-connects backup1 too. During the process Live2
> didn't
> successfully remove the old remote binding for the subs and it still point to
> the
> old temp queue's id (which is gone with the Live1 as it's a temp queue).
> So the messages (after failover) still are routed to the old queue which is
> no longer there. The subscriber will be idle without receiving new messages
> from it.
> The code concerned this :
> https://github.com/apache/activemq-artemis/blob/master/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/cluster/impl/ClusterConnectionImpl.java#L1239
> The code doesn't take care of the case where it's possible that the old
> remote binding is still in the map the it's key (clusterName) will be the
> same as the new remote binding (which references to a new temp queue)
> recreated on fail over.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)