[
https://issues.apache.org/jira/browse/ARTEMIS-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas Wood updated ARTEMIS-2690:
---------------------------------
Comment: was deleted
(was: i have artemis building on my laptop and would like to try resolving this
issue.
i see what looks like a good spot to add a quorum check but want to pass it by
everyone:
if (replicatedPolicy.isCheckForLiveServer() && isNodeIdUsed()) {
Maybe add another call to local function for the quorum vote here?
On second thought, i think i need to go deeper into the code since isNodeIdUsed
is a little misleading.
)
> Intermittent network failure caused live and replica to both be live
> --------------------------------------------------------------------
>
> Key: ARTEMIS-2690
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2690
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Affects Versions: 2.11.0
> Environment: Artemis 2.11.0, Ubuntu 18.04
> Reporter: Sebastian Lövdahl
> Priority: Major
> Attachments: live1-artemis.log, live1-broker.xml, live2-artemis.log,
> live2-broker.xml, live3-artemis.log, live3-broker.xml, replica1-artemis.log,
> replica1-broker.xml
>
>
> An intermittent network failure caused both the live and replica to be live.
> Both happily accepted incoming connections until the node that was supposed
> to be the replica was manually shut down. Log files from all 4 nodes are
> attached. The {{replica1}} node happened to have some TRACE logging enabled
> as well.
>
> As far as I have understood the documentation, the setup should be safe from
> a split brain point of view. The live2 and live3 nodes intentionally don't
> have any replicas at the moment. Complete {{broker.xml}} files are attached,
> but for reference, this is the {{ha-policy}}:
> live1:
> {code:xml}
> <ha-policy>
> <replication>
> <master>
> <cluster-name>my-cluster</cluster-name>
> <group-n ame>group1</group-name>
> <check-for-live-server>true</check-for-live-server>
> <vote-on-replication-failure>true</vote-on-replication-failure>
> </master>
> </replication>
> </ha-policy>
> {code}
> replica1:
> {code:xml}
> <ha-policy>
> <replication>
> <slave>
> <cluster-name>my-cluster</cluster-name>
> <group-name>group1</group-name>
> <allow-failback>true</allow-failback>
> <vote-on-replication-failure>true</vote-on-replication-failure>
> </slave>
> </replication>
> </ha-policy>
> {code}
> live2:
> {code:xml}
> <ha-policy>
> <replication>
> <master>
> <cluster-name>my-cluster</cluster-name>
> <group-name>group2</group-name>
> <check-for-live-server>true</check-for-live-server>
> <vote-on-replication-failure>true</vote-on-replication-failure>
> </master>
> </replication>
> </ha-policy>
> {code}
> live3:
> {code:xml}
> <ha-policy>
> <replication>
> <master>
> <cluster-name>my-cluster</cluster-name>
> <group-name>group2</group-name>
> <check-for-live-server>true</check-for-live-server>
> <vote-on-replication-failure>true</vote-on-replication-failure>
> </master>
> </replication>
> </ha-policy>
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)