[
https://issues.apache.org/jira/browse/ARTEMIS-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441763#comment-16441763
]
Justin Bertram commented on ARTEMIS-1285:
-----------------------------------------
> The original issue was about killing current slave (node2). This makes the
> hole between master (node1) and standby slave (node3) - node3 and node1 know
> nothing about each other. And as a result they can't communicate with each
> other and can't pass master role...
I understand that. I wasn't trying to deny this particular problem exists.
That said, [~Antauri] brought up a more fundamental issue saying:
> The underlying problem is that on a fresh install of Artemis with live (r1)
> plus r2 (1st replica) and r3 (2nd replica) makes the "r3" instance go into
> that logging loop. So we can't even reach the situation of having the 1x live
> + 2 backups due to a bug (probably in locating the node).
The evidence I have from the "replicated-multiple-failover" example indicates
that this isn't a problem because if it was then the example wouldn't even run.
However, this needs to be investigated because if there is a more fundamental
problem here it would need to be addressed before the original issue can be
addressed. So there are 2 issues here which need to be dealt with:
# The original issue where if the active backup dies then the additional backup
does not take over.
# The issue which [~Antauri] is describing where a live-backup-backup
configuration (e.g. from the "replicated-multiple-failover" example) can't even
be established.
> Also I can't agree with this one. 3-nodes deployment is quite common.
I guess we can agree to disagree here. The original statement from [~Antauri]
was that, "Most deployments will prefer an 3x data replication." I've been a
developer on Artemis since it was donated to Apache in late 2014 and before
that on HornetQ (where Artemis came from) for several years. Most deployments
don't even use HA; they are simple one-broker deployments. Many of the
deployments that do use HA use shared storage. Even among the remaining
deployments which use replication for HA, a live-backup-backup configuration is
not common. If most deployments were using live-backup-backup then this issue
would have been discovered and fixed long ago. But I digress.
> The most useful case (at least for me) is to avoid split-brains when 2 nodes
> think they are masters.
I don't believe a live-backup-backup configuration would be effective at
mitigating split-brain because when the connection between the live and active
backup fails the passive backup will not participate in the quorum voting.
> Standby slave would not announce replication to master when the slave is down
> -----------------------------------------------------------------------------
>
> Key: ARTEMIS-1285
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1285
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Components: Broker
> Affects Versions: 2.1.0
> Reporter: yangwei
> Priority: Major
>
> We have a cluster of 3 instances: A is master, B is slave and C is standby
> slave. When slave is down, we expect C announces replication to A but A is in
> standalone mode all the time. We see C waits at "nodeLocator.locateNode()"
> through jstack command.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)