[jira] [Updated] (ARTEMIS-3340) Replicated Journal quorum-based logical timestamp

Francesco Nigro (Jira) Thu, 10 Jun 2021 03:18:04 -0700


     [ 
https://issues.apache.org/jira/browse/ARTEMIS-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Francesco Nigro updated ARTEMIS-3340:
-------------------------------------
    Description: 
Shared-nothing replication, both classic and using pluggable quorum vote, can 
cause journal misalignment despite no split-brain events.

Scenario without network partitions/outages:
# Master/Primary start as live, clients connect to it
# Backup become an in-sync replica
# User stop live and backup failover to it
# Backup serve clients, modifying its journal
# User stop backup
# User start master/primary: it become live with a journal misaligned with the 
most up-to-date one

The main cause of this scenario is because we allow a single broker to server 
clients. 
A secondary cause is that the quorum service (embedded on classic, pluggable on 
https://issues.apache.org/jira/browse/ARTEMIS-2716) 
just take care to prevent multiple brokers to be live at the same time, but it 
won't consider the journal content to guarantee that only the broker with the 
most up-to-date data should be allowed to become live.

A possible solution is to leverage on 
https://issues.apache.org/jira/browse/ARTEMIS-2716 and store a "logical 
timestamp" that mark the age of the journal of a broker in order to just allow 
the one with the most up-to-date one to become a proper live.

In case of quorum service restart/outage, the admin can use 
command/configuration to let a broker to ignore the age of its journal and just 
force it to start.
In addition, admins need journal CLI commands to inspect the age of a broker 
journal, for troubleshooting reasons.

It's very important to capture every possible event that cause the journal age 
to change
eg 
# live broker send its journal file to a not yet in sync replica backup, along 
with its "journal age"
# backup is now ready to failover in any moment
# a network partition happen 
# backup try to become live for vote-retries times
# live detect replication disconnection but is "lucky" that can reach the 
quorum and continue serving clients
# live increment the age of the journal before serving clients
# an outage cause live to die
# network partition is restored
# backup detect live journal age is no longer matching its own journal: stop 
trying to become live

The key parts related to journal age/version are:
* only who's live can change journal version (with a monotonic increment)
* every breaking point event must cause journal age/version to change eg 
starting as live, loosing its backup, etc etc

There is an high chance we should re-think how roles works (and maybe 
deprecating some behavior): 
a restarted backup which journal age is the highest one (ie was the last live 
broker) is going to "rotate" its data/journal files while starting, 
ready to serve as replica of some other live around, if any; *the broker with 
the most up to date data isn't not interested to be live, because of its 
role(!).*

Admin should inspect journal age while backup is still stopped to compare with 
other brokers:
* change its role into primary/master to give it the chance to become live
* copy its data into the primary in order to grant it to have the most up to 
date data







  was:
Shared-nothing replication, both classic and using pluggable quorum vote, can 
cause journal misalignment despite no split-brain events.

Scenario without network partitions/outages:
# Master/Primary start as live, clients connect to it
# Backup become an in-sync replica
# User stop live and backup failover to it
# Backup serve clients, modifying its journal
# User stop backup
# User start master/primary: it become live with a journal misaligned with the 
most up-to-date one

The main cause of this scenario is because we allow a single broker to server 
clients. 
A secondary cause is that the quorum service (embedded on classic, pluggable on 
https://issues.apache.org/jira/browse/ARTEMIS-2716) 
just take care to prevent multiple brokers to be live at the same time, but it 
won't consider the journal content to guarantee that only the broker with the 
most up-to-date data should be allowed to become live.

A possible solution is to leverage on 
https://issues.apache.org/jira/browse/ARTEMIS-2716 and store a "logical 
timestamp" that mark the age of the journal of a broker in order to allow only 
the one with the most up-to-date one to become a proper live.
It means that in case of quorum service restart/outage, the admin can use 
command/configuration to let a broker to ignore the age of its journal and just 
force it to start.
In addition admins need journal CLI commands to inspect the age of a broker 
journal, for troubleshooting reasons.

It's very important to capture every possible event that cause the journal age 
to change
eg 
# live broker send its journal file to a not yet in sync replica backup, along 
with its "journal age"
# backup is now ready to failover in any moment
# a network partition happen 
# backup try to become live for vote-retries times
# live detect replication disconnection but is "lucky" that can reach the 
quorum and continue serving clients
# live increment the age of the journal before serving clients
# an outage cause live to die
# network partition is restored
# backup detect live journal age is no longer matching its own journal: stop 
trying to become live

The key parts related to journal age/version are:
* only who's live can change journal version (with a monotonic increment)
* every breaking point event must cause journal age/version to change eg 
starting as live, loosing its backup, etc etc

There is an high chance we should re-think how roles works (and maybe 
deprecating some behaviour): 
a restarted backup which journal age was the highest one (ie was the last live 
broker) is going to "rotate" its data/journal files, 
ready to serve as replica of some live; it means that the broker with the most 
up to date data isn't not interested to be live.
Admin should inspect journal age while backup is still stopped and:
* change its role into primary/master to give it the chance to become live
* copy its data into the primary in order to grant it to have the most up to 
date data








> Replicated Journal quorum-based logical timestamp
> -------------------------------------------------
>
>                 Key: ARTEMIS-3340
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3340
>             Project: ActiveMQ Artemis
>          Issue Type: Improvement
>            Reporter: Francesco Nigro
>            Priority: Major
>
> Shared-nothing replication, both classic and using pluggable quorum vote, can 
> cause journal misalignment despite no split-brain events.
> Scenario without network partitions/outages:
> # Master/Primary start as live, clients connect to it
> # Backup become an in-sync replica
> # User stop live and backup failover to it
> # Backup serve clients, modifying its journal
> # User stop backup
> # User start master/primary: it become live with a journal misaligned with 
> the most up-to-date one
> The main cause of this scenario is because we allow a single broker to server 
> clients. 
> A secondary cause is that the quorum service (embedded on classic, pluggable 
> on https://issues.apache.org/jira/browse/ARTEMIS-2716) 
> just take care to prevent multiple brokers to be live at the same time, but 
> it won't consider the journal content to guarantee that only the broker with 
> the most up-to-date data should be allowed to become live.
> A possible solution is to leverage on 
> https://issues.apache.org/jira/browse/ARTEMIS-2716 and store a "logical 
> timestamp" that mark the age of the journal of a broker in order to just 
> allow the one with the most up-to-date one to become a proper live.
> In case of quorum service restart/outage, the admin can use 
> command/configuration to let a broker to ignore the age of its journal and 
> just force it to start.
> In addition, admins need journal CLI commands to inspect the age of a broker 
> journal, for troubleshooting reasons.
> It's very important to capture every possible event that cause the journal 
> age to change
> eg 
> # live broker send its journal file to a not yet in sync replica backup, 
> along with its "journal age"
> # backup is now ready to failover in any moment
> # a network partition happen 
> # backup try to become live for vote-retries times
> # live detect replication disconnection but is "lucky" that can reach the 
> quorum and continue serving clients
> # live increment the age of the journal before serving clients
> # an outage cause live to die
> # network partition is restored
> # backup detect live journal age is no longer matching its own journal: stop 
> trying to become live
> The key parts related to journal age/version are:
> * only who's live can change journal version (with a monotonic increment)
> * every breaking point event must cause journal age/version to change eg 
> starting as live, loosing its backup, etc etc
> There is an high chance we should re-think how roles works (and maybe 
> deprecating some behavior): 
> a restarted backup which journal age is the highest one (ie was the last live 
> broker) is going to "rotate" its data/journal files while starting, 
> ready to serve as replica of some other live around, if any; *the broker with 
> the most up to date data isn't not interested to be live, because of its 
> role(!).*
> Admin should inspect journal age while backup is still stopped to compare 
> with other brokers:
> * change its role into primary/master to give it the chance to become live
> * copy its data into the primary in order to grant it to have the most up to 
> date data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-3340) Replicated Journal quorum-based logical timestamp

Reply via email to