[
https://issues.apache.org/jira/browse/ARTEMIS-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441385#comment-16441385
]
Catalin Alexandru Zamfir edited comment on ARTEMIS-1285 at 4/17/18 8:06 PM:
----------------------------------------------------------------------------
The spin-off issues have been provided as reference only. One has a
reproducible attachment. I know it's not replicated to the last replica (r3)
but only if the 1st replica goes down. :) I've read the manual top to bottom.
The underlying problem is that on a fresh install of Artemis with live (r1)
plus r2 (1st replica) and r3 (2nd replica) makes the "r3" instance go into that
logging loop. So we can't even reach the situation of having the 1x live + 2
backups due to a bug (probably in locating the node). ENTMQBR822 seems to have
narrowed down to NamedLiveNodeLocatorForReplication.java. Github shows some
recent activity (last commit on this file) with some changes affecting this
code. Commmit:
[https://github.com/apache/activemq-artemis/commit/4a57aecbbfea6453e9d74ba398ea0f89ee28fdbb]
Seems tagged for 2.1.0 onwards (to 2.5.0). A regression maybe? I'm not all that
familiar with the code base.
Later edit: Seems that after a manual restart of the r3 instance it enters the
proper cluster configuration, waiting for r2 to fail. Then any subsequent
failover happens in any direction you take it. We could "automate" this manual
step by checking the logs for the error and if "r3" happens to be issuing
these, we try a restart of it (but that's a hack). The "Server is stopped" is
always reproducible on a fresh install.
Anyway, happy to have shared our experience of this topology. Hope someone more
familiar can help finding the root cause.
was (Author: antauri):
The spin-off issues have been provided as reference only. One has a
reproducible attachment. I know it's not replicated to the last replica (r3)
but only if the 1st replica goes down. :) I've read the manual top to bottom.
The underlying problem is that on a fresh install of Artemis with live (r1)
plus r2 (1st replica) and r3 (2nd replica) makes the "r3" instance go into that
logging loop. So we can't even reach the situation of having the 1x live + 2
backups due to a bug (probably in locating the node). ENTMQBR822 seems to have
narrowed down to NamedLiveNodeLocatorForReplication.java. Github shows some
recent activity (last commit on this file) with some changes affecting this
code. Commmit:
[https://github.com/apache/activemq-artemis/commit/4a57aecbbfea6453e9d74ba398ea0f89ee28fdbb]
Seems tagged for 2.1.0 onwards (to 2.5.0). A regression maybe? I'm not all that
familiar with the code base.
> Standby slave would not announce replication to master when the slave is down
> -----------------------------------------------------------------------------
>
> Key: ARTEMIS-1285
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1285
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Components: Broker
> Affects Versions: 2.1.0
> Reporter: yangwei
> Priority: Major
>
> We have a cluster of 3 instances: A is master, B is slave and C is standby
> slave. When slave is down, we expect C announces replication to A but A is in
> standalone mode all the time. We see C waits at "nodeLocator.locateNode()"
> through jstack command.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)