[ 
https://issues.apache.org/jira/browse/ARTEMIS-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994767#comment-16994767
 ] 

Thomas Wood commented on ARTEMIS-2568:
--------------------------------------

We experience this all the time and are relying on a custom ping to help delay 
the restart of the master.
Maybe a time setting to delay the restart would help with this?

> Race condition between failover processing and master restart can cause split 
> brain
> -----------------------------------------------------------------------------------
>
>                 Key: ARTEMIS-2568
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2568
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>    Affects Versions: 2.10.1
>            Reporter: Bob Mitchell
>            Priority: Major
>
> We have seen split brain in the following sequence of events when using 
> replicating backups with failback:
>  # Master fails or is shutdown
>  # Backup detects failure and starts to failover
>  # Master is restarted before Backup becomes "live"
>  # It's check for a "duplicate" server fails because backup is not live yet
>  # Master and backup both become live.
> At the very least, we would like to see the window for this to occur to be 
> reduced, possibly by having the backup check again for the master to be 
> available just before going live.  It might also be necessary to have the 
> master check for a duplicate server as a last step before going live as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to