Bob Mitchell created ARTEMIS-2568:
-------------------------------------
Summary: Race condition between failover processing and master
restart can cause split brain
Key: ARTEMIS-2568
URL: https://issues.apache.org/jira/browse/ARTEMIS-2568
Project: ActiveMQ Artemis
Issue Type: Bug
Affects Versions: 2.10.1
Reporter: Bob Mitchell
We have seen split brain in the following sequence of events when using
replicating backups with failback:
# Master fails or is shutdown
# Backup detects failure and starts to failover
# Master is restarted before Backup becomes "live"
# It's check for a "duplicate" server fails because backup is not live yet
# Master and backup both become live.
At the very least, we would like to see the window for this to occur to be
reduced, possibly by having the backup check again for the master to be
available just before going live. It might also be necessary to have the
master check for a duplicate server as a last step before going live as well.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)