[
https://issues.apache.org/jira/browse/ARTEMIS-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Miroslav Novak updated ARTEMIS-473:
-----------------------------------
Description:
If master-slave pair is configured using replicated journal and there are no
other servers in cluster then if network between master and slave is broken
then slave will activate. Depending on whether clients were disconnected from
master or not there might be or might not be failover to slave. Problem happens
in the moment when network between master and slave is restored. Master and
slave are active at the same time which is the split brain syndrom. Currently
there is no recovery mechanism to solve this situation.
Suggested improvement: If clients failovered to slave then master will restart
itself so failback occurs (if configured). If clients did not failover and
stayed connected to master then backup will restart itself.
was:
if there are 2 live/backup pairs with replicated journal in colocated topology
Artemis1(L1/B2) <-> Artemis2(L2/B1) then there is no easy way to start them if
they're all shutdown.
Problem is that there is no way how to start the servers with most up-to-date
journal. If administrator shutdown servers in sequence Artemis1 and then
Artemis 2. Then Artemis 2 has the most up-to-date journals because backup B1 on
server2 activated.
Then If administrator decides to start Artemis2 then live L2 activates and
backup B1 waits for live L1 in Artemis 1 to start. But once L1 starts then L1
replicates its own "old" journal to B1.
So L1 started with bad old journal. I would suggest that L1 and B1 compares
theirs journals and figure out which one is more up-to-date. Then server with
more up-to-date journal activates.
In scenario described above it would be backup B1 which will activate first.
Live L1 will synchronize its own journal from B1 and then failback happens.
> Resolve split brain data after split brains scenarios.
> ------------------------------------------------------
>
> Key: ARTEMIS-473
> URL: https://issues.apache.org/jira/browse/ARTEMIS-473
> Project: ActiveMQ Artemis
> Issue Type: New Feature
> Components: Broker
> Affects Versions: 1.2.0
> Reporter: Miroslav Novak
> Priority: Critical
>
> If master-slave pair is configured using replicated journal and there are no
> other servers in cluster then if network between master and slave is broken
> then slave will activate. Depending on whether clients were disconnected from
> master or not there might be or might not be failover to slave. Problem
> happens in the moment when network between master and slave is restored.
> Master and slave are active at the same time which is the split brain
> syndrom. Currently there is no recovery mechanism to solve this situation.
> Suggested improvement: If clients failovered to slave then master will
> restart itself so failback occurs (if configured). If clients did not
> failover and stayed connected to master then backup will restart itself.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)