[jira] [Commented] (ARTEMIS-1285) Standby slave would not announce replication to master when the slave is down

Justin Bertram (JIRA) Tue, 17 Apr 2018 19:06:50 -0700

    [ 
https://issues.apache.org/jira/browse/ARTEMIS-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441763#comment-16441763
 ]


Justin Bertram commented on ARTEMIS-1285:
-----------------------------------------

> The original issue was about killing current slave (node2). This makes the 
> hole between master (node1) and standby slave (node3) - node3 and node1 know 
> nothing about each other. And as a result they can't communicate with each 
> other and can't pass master role...

I understand that. I wasn't trying to deny this particular problem exists. 

That said, [~Antauri] brought up a more fundamental issue saying:

> The underlying problem is that on a fresh install of Artemis with live (r1) 
> plus r2 (1st replica) and r3 (2nd replica) makes the "r3" instance go into 
> that logging loop. So we can't even reach the situation of having the 1x live 
> + 2 backups due to a bug (probably in locating the node).

The evidence I have from the "replicated-multiple-failover" example indicates 
that this isn't a problem because if it was then the example wouldn't even run. 
However, this needs to be investigated because if there is a more fundamental 
problem here it would need to be addressed before the original issue can be 
addressed. So there are 2 issues here which need to be dealt with:

# The original issue where if the active backup dies then the additional backup 
does not take over.
# The issue which [~Antauri] is describing where a live-backup-backup 
configuration (e.g. from the "replicated-multiple-failover" example) can't even 
be established.

> Also I can't agree with this one. 3-nodes deployment is quite common.

I guess we can agree to disagree here. The original statement from [~Antauri] 
was that, "Most deployments will prefer an 3x data replication." I've been a 
developer on Artemis since it was donated to Apache in late 2014 and before 
that on HornetQ (where Artemis came from) for several years. Most deployments 
don't even use HA; they are simple one-broker deployments. Many of the 
deployments that do use HA use shared storage. Even among the remaining 
deployments which use replication for HA, a live-backup-backup configuration is 
not common. If most deployments were using live-backup-backup then this issue 
would have been discovered and fixed long ago. But I digress.

> The most useful case (at least for me) is to avoid split-brains when 2 nodes 
> think they are masters.

I don't believe a live-backup-backup configuration would be effective at 
mitigating split-brain because when the connection between the live and active 
backup fails the passive backup will not participate in the quorum voting.

> Standby slave would not announce replication to master when the slave is down
> -----------------------------------------------------------------------------
>
>                 Key: ARTEMIS-1285
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-1285
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.1.0
>            Reporter: yangwei
>            Priority: Major
>
> We have a cluster of 3 instances: A is master, B is slave and C is standby 
> slave. When slave is down, we expect C announces replication to A but A is in 
> standalone mode all the time. We see C waits at "nodeLocator.locateNode()" 
> through jstack command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARTEMIS-1285) Standby slave would not announce replication to master when the slave is down

Reply via email to