[jira] [Comment Edited] (ARTEMIS-1285) Standby slave would not announce replication to master when the slave is down

Justin Bertram (JIRA) Tue, 17 Apr 2018 19:06:46 -0700

    [ 
https://issues.apache.org/jira/browse/ARTEMIS-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441763#comment-16441763
 ]


Justin Bertram edited comment on ARTEMIS-1285 at 4/18/18 2:05 AM:
------------------------------------------------------------------

bq. The original issue was about killing current slave (node2). This makes the 
hole between master (node1) and standby slave (node3) - node3 and node1 know 
nothing about each other. And as a result they can't communicate with each 
other and can't pass master role...

I understand that. I wasn't trying to deny this particular problem exists. 

That said, [~Antauri] brought up a more fundamental issue saying:

bq. The underlying problem is that on a fresh install of Artemis with live (r1) 
plus r2 (1st replica) and r3 (2nd replica) makes the "r3" instance go into that 
logging loop. So we can't even reach the situation of having the 1x live + 2 
backups due to a bug (probably in locating the node).

The evidence I have from the "replicated-multiple-failover" example indicates 
that this isn't a problem because if it was then the example wouldn't even run. 
However, this needs to be investigated because if there is a more fundamental 
problem here it would need to be addressed before the original issue can be 
addressed. So there are 2 issues here which need to be dealt with:

# The original issue where if the active backup dies then the additional backup 
does not take over.
# The issue which [~Antauri] is describing where a live-backup-backup 
configuration (e.g. from the "replicated-multiple-failover" example) can't even 
be established.

bq. Also I can't agree with this one. 3-nodes deployment is quite common.

I guess we can agree to disagree here. The original statement from [~Antauri] 
was that, "Most deployments will prefer an 3x data replication." I've been a 
developer on Artemis since it was donated to Apache in late 2014 and before 
that on HornetQ (where Artemis came from) for several years. Most deployments 
don't even use HA; they are simple one-broker deployments. Many of the 
deployments that do use HA use shared storage. Even among the remaining 
deployments which use replication for HA, a live-backup-backup configuration is 
not common. If most deployments were using live-backup-backup then this issue 
would have been discovered and fixed long ago. But I digress.

bq. The most useful case (at least for me) is to avoid split-brains when 2 
nodes think they are masters.

I don't believe a live-backup-backup configuration would be effective at 
mitigating split-brain because when the connection between the live and active 
backup fails the passive backup will not participate in the quorum voting.


was (Author: jbertram):
> The original issue was about killing current slave (node2). This makes the 
> hole between master (node1) and standby slave (node3) - node3 and node1 know 
> nothing about each other. And as a result they can't communicate with each 
> other and can't pass master role...

I understand that. I wasn't trying to deny this particular problem exists. 

That said, [~Antauri] brought up a more fundamental issue saying:

> The underlying problem is that on a fresh install of Artemis with live (r1) 
> plus r2 (1st replica) and r3 (2nd replica) makes the "r3" instance go into 
> that logging loop. So we can't even reach the situation of having the 1x live 
> + 2 backups due to a bug (probably in locating the node).

The evidence I have from the "replicated-multiple-failover" example indicates 
that this isn't a problem because if it was then the example wouldn't even run. 
However, this needs to be investigated because if there is a more fundamental 
problem here it would need to be addressed before the original issue can be 
addressed. So there are 2 issues here which need to be dealt with:

# The original issue where if the active backup dies then the additional backup 
does not take over.
# The issue which [~Antauri] is describing where a live-backup-backup 
configuration (e.g. from the "replicated-multiple-failover" example) can't even 
be established.

> Also I can't agree with this one. 3-nodes deployment is quite common.

I guess we can agree to disagree here. The original statement from [~Antauri] 
was that, "Most deployments will prefer an 3x data replication." I've been a 
developer on Artemis since it was donated to Apache in late 2014 and before 
that on HornetQ (where Artemis came from) for several years. Most deployments 
don't even use HA; they are simple one-broker deployments. Many of the 
deployments that do use HA use shared storage. Even among the remaining 
deployments which use replication for HA, a live-backup-backup configuration is 
not common. If most deployments were using live-backup-backup then this issue 
would have been discovered and fixed long ago. But I digress.

> The most useful case (at least for me) is to avoid split-brains when 2 nodes 
> think they are masters.

I don't believe a live-backup-backup configuration would be effective at 
mitigating split-brain because when the connection between the live and active 
backup fails the passive backup will not participate in the quorum voting.

> Standby slave would not announce replication to master when the slave is down
> -----------------------------------------------------------------------------
>
>                 Key: ARTEMIS-1285
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-1285
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.1.0
>            Reporter: yangwei
>            Priority: Major
>
> We have a cluster of 3 instances: A is master, B is slave and C is standby 
> slave. When slave is down, we expect C announces replication to A but A is in 
> standalone mode all the time. We see C waits at "nodeLocator.locateNode()" 
> through jstack command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (ARTEMIS-1285) Standby slave would not announce replication to master when the slave is down

Reply via email to