[jira] [Updated] (ARTEMIS-3340) Replicated Journal quorum-based logical timestamp

Francesco Nigro (Jira) Thu, 10 Jun 2021 03:02:04 -0700


     [ 
https://issues.apache.org/jira/browse/ARTEMIS-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Francesco Nigro updated ARTEMIS-3340:
-------------------------------------
    Description: 
Shared-nothing replication, both classic and using pluggable quorum vote, can 
cause journal misalignment despite no split-brain events.

Scenario without network partitions/outages:
# Master/Primary start as live, clients connect to it
# Backup become an in-sync replica
# User stop live and backup failover to it
# Backup serve clients, modifying its journal
# User stop backup
# User start master/primary: it become live with a journal misaligned with the 
most up-to-date one

The main cause of this scenario is because we allow a single broker to server 
clients. 
A secondary cause (for other journal misalignment cases) is that the quorum 
service (embedded on classic, pluggable on 
https://issues.apache.org/jira/browse/ARTEMIS-2716) 
just take care to prevent multiple brokers to be live at the same time, but it 
won't consider journal state to guarantee that only the broker with the most 
up-to-date journal is allowed to be live.

For a backup broker, with no primary/master around, makes sense, but this can 
cause bad restart/retry ordering to let a broker with a stale journal to win 
the race to become live.

A possible solution is to leverage on 
https://issues.apache.org/jira/browse/ARTEMIS-2716 and store a "logical 
timestamp" that mark the age of the journal of a broker in order to allow only 
the one with the most up-to-date one to become a proper live.
It means that in case of quorum service restart/outage, the admin must have 
some command/configuration to let a broker to ignore the age of its journal and 
just force it to start.
In addition must be exposed some new journal CLI commands to inspect the age of 
a broker journal, for troubleshooting reasons.

Its very important to capture every possible event that cause the journal age 
to change
eg 
# live broker send its journal file to a not yet in sync replica backup, along 
with its "journal age"
# backup is now ready to failover in any moment
# a network partition happen 
# backup try to become live for vote-retries times
# live detect replication disconnection but is "lucky" that can reach the 
quorum and continue serving clients
# live increment the age of its journal
# an outage cause live to die
# network partition is restored
# backup detect that journal age is no longer matching its own journal: it stop 
trying to become live

The key parts related to journal age/version are:
* only who's live can change journal version (with a monotonic increment)
* every breaking point event must cause journal age/version to change eg 
starting as live, loosing its backup, etc etc







  was:
Shared-nothing replication, both classic and using pluggable quorum vote, can 
cause journal misalignment despite no split-brain events.

Scenario without network partitions/outages:
# Master/Primary start as live, clients connect to it
# Backup become an in-sync replica
# User stop live and backup failover to it
# Backup serve clients, modifying its journal
# User stop backup
# User start master/primary: it become live with a journal misaligned with the 
most up-to-date one

The main cause of this scenario is because we allow a single broker to server 
clients, despite configured with HA. 
A secondary cause (for other journal misalignment cases) is that the quorum 
service (embedded on classic, pluggable on 
https://issues.apache.org/jira/browse/ARTEMIS-2716) 
just take care to prevent multiple brokers to be live at the same time, but it 
won't consider journal state to guarantee that only the broker with the most 
up-to-date journal is allowed to be live.

For a backup broker, with no primary/master around, makes sense, but this can 
cause bad restart/retry ordering to let a broker with a stale journal to win 
the race to become live.

A possible solution is to leverage on 
https://issues.apache.org/jira/browse/ARTEMIS-2716 and store a "logical 
timestamp" that mark the age of the journal of a broker in order to allow only 
the one with the most up-to-date one to become a proper live.
It means that in case of quorum service restart/outage, the admin must have 
some command/configuration to let a broker to ignore the age of its journal and 
just force it to start.
In addition must be exposed some new journal CLI commands to inspect the age of 
a broker journal, for troubleshooting reasons.

Its very important to capture every possible event that cause the journal age 
to change
eg 
# live broker send its journal file to a not yet in sync replica backup, along 
with its "journal age"
# backup is now ready to failover in any moment
# a network partition happen 
# backup try to become live for vote-retries times
# live detect replication disconnection but is "lucky" that can reach the 
quorum and continue serving clients
# live increment the age of its journal
# an outage cause live to die
# network partition is restored
# backup detect that journal age is no longer matching its own journal: it stop 
trying to become live

The key parts related to journal age/version are:
* only who's live can change journal version (with a monotonic increment)
* every breaking point event must cause journal age/version to change eg 
starting as live, loosing its backup, etc etc








> Replicated Journal quorum-based logical timestamp
> -------------------------------------------------
>
>                 Key: ARTEMIS-3340
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3340
>             Project: ActiveMQ Artemis
>          Issue Type: Improvement
>            Reporter: Francesco Nigro
>            Priority: Major
>
> Shared-nothing replication, both classic and using pluggable quorum vote, can 
> cause journal misalignment despite no split-brain events.
> Scenario without network partitions/outages:
> # Master/Primary start as live, clients connect to it
> # Backup become an in-sync replica
> # User stop live and backup failover to it
> # Backup serve clients, modifying its journal
> # User stop backup
> # User start master/primary: it become live with a journal misaligned with 
> the most up-to-date one
> The main cause of this scenario is because we allow a single broker to server 
> clients. 
> A secondary cause (for other journal misalignment cases) is that the quorum 
> service (embedded on classic, pluggable on 
> https://issues.apache.org/jira/browse/ARTEMIS-2716) 
> just take care to prevent multiple brokers to be live at the same time, but 
> it won't consider journal state to guarantee that only the broker with the 
> most up-to-date journal is allowed to be live.
> For a backup broker, with no primary/master around, makes sense, but this can 
> cause bad restart/retry ordering to let a broker with a stale journal to win 
> the race to become live.
> A possible solution is to leverage on 
> https://issues.apache.org/jira/browse/ARTEMIS-2716 and store a "logical 
> timestamp" that mark the age of the journal of a broker in order to allow 
> only the one with the most up-to-date one to become a proper live.
> It means that in case of quorum service restart/outage, the admin must have 
> some command/configuration to let a broker to ignore the age of its journal 
> and just force it to start.
> In addition must be exposed some new journal CLI commands to inspect the age 
> of a broker journal, for troubleshooting reasons.
> Its very important to capture every possible event that cause the journal age 
> to change
> eg 
> # live broker send its journal file to a not yet in sync replica backup, 
> along with its "journal age"
> # backup is now ready to failover in any moment
> # a network partition happen 
> # backup try to become live for vote-retries times
> # live detect replication disconnection but is "lucky" that can reach the 
> quorum and continue serving clients
> # live increment the age of its journal
> # an outage cause live to die
> # network partition is restored
> # backup detect that journal age is no longer matching its own journal: it 
> stop trying to become live
> The key parts related to journal age/version are:
> * only who's live can change journal version (with a monotonic increment)
> * every breaking point event must cause journal age/version to change eg 
> starting as live, loosing its backup, etc etc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-3340) Replicated Journal quorum-based logical timestamp

Reply via email to