[jira] [Comment Edited] (ARTEMIS-2568) Race condition between failover processing and master restart can cause split brain

Francesco Nigro (Jira) Thu, 02 Apr 2020 23:32:05 -0700


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072483#comment-17072483
 ]


Francesco Nigro edited comment on ARTEMIS-2568 at 4/3/20, 6:31 AM:
-------------------------------------------------------------------

> Sounds a bit more alarming to me to be honest, but this is likely not the 
> correct ticket to have that discussion on.

Better to create a new issue for this I suppose: anyway, if there is a 
connectivity loss and you're not using at least 3 nodes cluster, the quorum 
vote on slave (while backup) cannot work as expect and you will risk split 
brain, because none can legit the failover.


was (Author: [email protected]):
> Sounds a bit more alarming to me to be honest, but this is likely not the 
> correct ticket to have that discussion on.

Better to create a new issue for this I suppose: anyway, if there is a 
connectivity loss and you're not using at least 3 lives, the quorum vote on 
slave while backup won't work as expect and you will risk split brain, because 
none can legit the failover.

> Race condition between failover processing and master restart can cause split 
> brain
> -----------------------------------------------------------------------------------
>
>                 Key: ARTEMIS-2568
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2568
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>    Affects Versions: 2.10.1
>            Reporter: Bob Mitchell
>            Priority: Major
>
> We have seen split brain in the following sequence of events when using 
> replicating backups with failback:
>  # Master fails or is shutdown
>  # Backup detects failure and starts to failover
>  # Master is restarted before Backup becomes "live"
>  # It's check for a "duplicate" server fails because backup is not live yet
>  # Master and backup both become live.
> At the very least, we would like to see the window for this to occur to be 
> reduced, possibly by having the backup check again for the master to be 
> available just before going live.  It might also be necessary to have the 
> master check for a duplicate server as a last step before going live as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARTEMIS-2568) Race condition between failover processing and master restart can cause split brain

Reply via email to