[ 
https://issues.apache.org/jira/browse/ARTEMIS-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442174#comment-16442174
 ] 

Catalin Alexandru Zamfir commented on ARTEMIS-1285:
---------------------------------------------------

Ok, so we have 2 issues here:
 * on the 2nd standby backup if you have the Hawtio console open it will 
complain with AMQ222040: Server is stopped on the logs, spamming the logs;
 ** message was misleading, pointing me to this ticket in the end;
 * we've tested all possible failure scenarios;
 ** the one in the examples works, when master fails first, then 1st backup 
becomes live and the 2nd backup becomes active backup to the current live 
(former 1st backup);
 ** if however the master is live but the 1st backup fails, the issue here in 
ARTEMIS-1285, the 2nd-ary slave doesn't take over as active backup. Doesn't 
vote, doesn't detect the 1st backup failed. Just sits and waits.
 *** if we restarted the 1st backup, restoring "the link" then failure occurs 
as in the examples;

Fact is, real-life there should be a "competition" between backups (eg. ZK 
leader election of backup instances) to the one single "live". So that if your 
1st backup fails but your master has not yet failed, you will get your 2nd-ary 
backup in sync with your master. This would improve the fault tolerance of the 
cluster as a whole.

I agree with Denis here that either we document this specific situation 
(ARTEMIS-1285) so that people are aware that the 2nd-ary backup is there only 
if this specific failure scenario happens (master first, 1st backup then 2nd 
backup) as in the examples. Or the logic promotes some competition between the 
backups (by voting who wins the ability to become backup for the given live in 
the group).

> Standby slave would not announce replication to master when the slave is down
> -----------------------------------------------------------------------------
>
>                 Key: ARTEMIS-1285
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-1285
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.1.0
>            Reporter: yangwei
>            Priority: Major
>
> We have a cluster of 3 instances: A is master, B is slave and C is standby 
> slave. When slave is down, we expect C announces replication to A but A is in 
> standalone mode all the time. We see C waits at "nodeLocator.locateNode()" 
> through jstack command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to