[jira] [Comment Edited] (ARTEMIS-1285) Standby slave would not announce replication to master when the slave is down

Catalin Alexandru Zamfir (JIRA) Tue, 17 Apr 2018 13:07:25 -0700

    [ 
https://issues.apache.org/jira/browse/ARTEMIS-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441385#comment-16441385
 ]


Catalin Alexandru Zamfir edited comment on ARTEMIS-1285 at 4/17/18 8:06 PM:
----------------------------------------------------------------------------

The spin-off issues have been provided as reference only. One has a 
reproducible attachment. I know it's not replicated to the last replica (r3) 
but only if the 1st replica goes down. :) I've read the manual top to bottom.

The underlying problem is that on a fresh install of Artemis with live (r1) 
plus r2 (1st replica) and r3 (2nd replica) makes the "r3" instance go into that 
logging loop. So we can't even reach the situation of having the 1x live + 2 
backups due to a bug (probably in locating the node). ENTMQBR822 seems to have 
narrowed down to NamedLiveNodeLocatorForReplication.java. Github shows some 
recent activity (last commit on this file) with some changes affecting this 
code. Commmit: 
[https://github.com/apache/activemq-artemis/commit/4a57aecbbfea6453e9d74ba398ea0f89ee28fdbb]

Seems tagged for 2.1.0 onwards (to 2.5.0). A regression maybe? I'm not all that 
familiar with the code base.

Later edit: Seems that after a manual restart of the r3 instance it enters the 
proper cluster configuration, waiting for r2 to fail. Then any subsequent 
failover happens in any direction you take it. We could "automate" this manual 
step by checking the logs for the error and if "r3" happens to be issuing 
these, we try a restart of it (but that's a hack). The "Server is stopped" is 
always reproducible on a fresh install.

Anyway, happy to have shared our experience of this topology. Hope someone more 
familiar can help finding the root cause.


was (Author: antauri):
The spin-off issues have been provided as reference only. One has a 
reproducible attachment. I know it's not replicated to the last replica (r3) 
but only if the 1st replica goes down. :) I've read the manual top to bottom.

The underlying problem is that on a fresh install of Artemis with live (r1) 
plus r2 (1st replica) and r3 (2nd replica) makes the "r3" instance go into that 
logging loop. So we can't even reach the situation of having the 1x live + 2 
backups due to a bug (probably in locating the node). ENTMQBR822 seems to have 
narrowed down to NamedLiveNodeLocatorForReplication.java. Github shows some 
recent activity (last commit on this file) with some changes affecting this 
code. Commmit: 
[https://github.com/apache/activemq-artemis/commit/4a57aecbbfea6453e9d74ba398ea0f89ee28fdbb]

Seems tagged for 2.1.0 onwards (to 2.5.0). A regression maybe? I'm not all that 
familiar with the code base.

 

> Standby slave would not announce replication to master when the slave is down
> -----------------------------------------------------------------------------
>
>                 Key: ARTEMIS-1285
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-1285
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.1.0
>            Reporter: yangwei
>            Priority: Major
>
> We have a cluster of 3 instances: A is master, B is slave and C is standby 
> slave. When slave is down, we expect C announces replication to A but A is in 
> standalone mode all the time. We see C waits at "nodeLocator.locateNode()" 
> through jstack command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (ARTEMIS-1285) Standby slave would not announce replication to master when the slave is down

Reply via email to