Gaurav created ARTEMIS-2421:
-------------------------------
Summary: Both Live and Backup node acting as Live and serving
after failover happened due to network failure
Key: ARTEMIS-2421
URL: https://issues.apache.org/jira/browse/ARTEMIS-2421
Project: ActiveMQ Artemis
Issue Type: Bug
Components: Broker, STOMP
Affects Versions: 2.6.4
Reporter: Gaurav
Assignee: Justin Bertram
Attachments: broker_master.xml, broker_slave.xml
We have Live-Backup server configuration, single instance of Artemis Live
server (2.6.4 version) backed up by single instance of Backup server.
Using shared file system as persistent storage.
Please refer attachments for both Live-Backup broker configuration.
*Fail Over Scenario*
# Node 1 acting as Live node and serving requests whereas Node 2 acting as
standby or passive node. No consumer is connected to these nodes
# Pushed 5 messages and verify message count as 5
# Perform NIC (Network) failure on Node 1 server ( i.e. Cluster is now unable
to connect to Node 1) . This will make Node 2 as Active and we are also able to
see previous 5 messages (pushed in step 2) successfully replicated on Node 2
# Bring the network connection back for Node 1. This is where we are facing
issues as now both nodes acting as Live nodes and getting continuous error as
below:
{quote}{{{color:#FF0000}AMQ212034: There are more than one servers on the
network broadcasting the same node id. You will see this message exactly once
(per node) if a node is restarted, in which case it can be safely ignored. But
if it is logged continuously it means you really do have more than one node on
the same network active concurrently with the same node id. This could occur
if you have a backup node active at the same time as its live node.
nodeID=cd323206-4adc-11e9-814b-506b8d4ee653{color}}}
{quote}
This situation bring entire cluster in inconsistent state and able to push
messages on both the nodes.
Any pointer on this issue is much appreciated!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)