[jira] [Created] (ARTEMIS-2690) Intermittent network failure caused live and replica to both be live

Jira Fri, 03 Apr 2020 01:38:10 -0700

Sebastian Lövdahl created ARTEMIS-2690:
------------------------------------------


             Summary: Intermittent network failure caused live and replica to 
both be live
                 Key: ARTEMIS-2690
                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2690
             Project: ActiveMQ Artemis
          Issue Type: Bug
    Affects Versions: 2.11.0
         Environment: Ubuntu 18.04
            Reporter: Sebastian Lövdahl
         Attachments: live1-artemis.log, live1-broker.xml, live2-artemis.log, 
live2-broker.xml, live3-artemis.log, live3-broker.xml, replica1-artemis.log, 
replica1-broker.xml

An intermittent network failure caused both the live and replica to be live. 
Both happily accepted incoming connections until the node that was supposed to 
be the replica was manually shut down. Log files from all 4 nodes are attached. 
The \{replica1} node happened to have some TRACE logging enabled as well.

 

As far as I have understood the documentation, the setup should be safe from a 
split brain point of view. The live2 and live3 nodes intentionally don't have 
any replicas at the moment. Complete {{broker.xml}} files are attached, but for 
reference, this is the {{ha-policy}}:

live1:
{code:xml}
<ha-policy>
 <replication>
 <master>
 <cluster-name>my-cluster</cluster-name>
 <group-n ame>group1</group-name>
 <check-for-live-server>true</check-for-live-server>
 <vote-on-replication-failure>true</vote-on-replication-failure>
 </master>
 </replication>
</ha-policy>
{code}
replica1:
{code:xml}
<ha-policy>
 <replication>
    <slave>
       <cluster-name>my-cluster</cluster-name>
       <group-name>group1</group-name>
       <allow-failback>true</allow-failback>
       <vote-on-replication-failure>true</vote-on-replication-failure>
    </slave>
 </replication>
</ha-policy>
{code}
live2:
{code:xml}
<ha-policy>
 <replication>
    <master>
       <cluster-name>my-cluster</cluster-name>
       <group-name>group2</group-name>
       <check-for-live-server>true</check-for-live-server>
       <vote-on-replication-failure>true</vote-on-replication-failure>
    </master>
 </replication>
</ha-policy>
{code}
live3:
{code:xml}
<ha-policy>
 <replication>
    <master>
       <cluster-name>my-cluster</cluster-name>
       <group-name>group2</group-name>
       <check-for-live-server>true</check-for-live-server>
       <vote-on-replication-failure>true</vote-on-replication-failure>
    </master>
 </replication>
</ha-policy>
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARTEMIS-2690) Intermittent network failure caused live and replica to both be live

Reply via email to