[ 
https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miroslav Novak updated ARTEMIS-473:
-----------------------------------
    Description: 
If master-slave pair is configured using replicated journal and there are no 
other servers in cluster then if network between master and slave is broken 
then slave will activate. Depending on whether clients were disconnected from 
master or not there might be or might not be failover to slave. Problem happens 
in the moment when network between master and slave is restored. Master and 
slave are active at the same time which is the split brain syndrom. Currently 
there is no recovery mechanism to solve this situation.

Suggested improvement: If clients failovered to slave then master will restart 
itself so failback occurs (if configured). If clients did not failover and 
stayed connected to master then backup will restart itself.

  was:
if there are 2 live/backup pairs with replicated journal in colocated topology 
Artemis1(L1/B2) <-> Artemis2(L2/B1) then there is no easy way to start them if 
they're all shutdown.

Problem is that there is no way how to start the servers with most up-to-date 
journal. If administrator shutdown servers in sequence Artemis1 and then 
Artemis 2. Then Artemis 2 has the most up-to-date journals because backup B1 on 
server2 activated.
Then If administrator decides to start Artemis2 then live L2 activates and 
backup B1 waits for live L1 in Artemis 1 to start. But once L1 starts then L1 
replicates its own "old" journal to B1.

So L1 started with bad old journal. I would suggest that L1 and B1 compares 
theirs journals and figure out which one is more up-to-date. Then server with 
more up-to-date journal activates.

In scenario described above it would be backup B1 which will activate first. 
Live L1 will synchronize its own journal from B1 and then failback happens.




> Resolve split brain data after split brains scenarios.
> ------------------------------------------------------
>
>                 Key: ARTEMIS-473
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-473
>             Project: ActiveMQ Artemis
>          Issue Type: New Feature
>          Components: Broker
>    Affects Versions: 1.2.0
>            Reporter: Miroslav Novak
>            Priority: Critical
>
> If master-slave pair is configured using replicated journal and there are no 
> other servers in cluster then if network between master and slave is broken 
> then slave will activate. Depending on whether clients were disconnected from 
> master or not there might be or might not be failover to slave. Problem 
> happens in the moment when network between master and slave is restored. 
> Master and slave are active at the same time which is the split brain 
> syndrom. Currently there is no recovery mechanism to solve this situation.
> Suggested improvement: If clients failovered to slave then master will 
> restart itself so failback occurs (if configured). If clients did not 
> failover and stayed connected to master then backup will restart itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to