[jira] [Commented] (ARTEMIS-3340) Replicated Journal quorum-based logical timestamp

Gary Tully (Jira) Thu, 10 Jun 2021 02:53:05 -0700


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17360699#comment-17360699
 ]


Gary Tully commented on ARTEMIS-3340:
-------------------------------------

it may be worth considering a more restrictive (and smaller/simpler) set of 
criteria.

Disallow a transition from REPLICATED to UNREPLICATED with the same lock 
version. ie: a LIVE that is REPLICATED (has an insync replica/backup), must 
exit or release the Journal lock if it is no longer REPLICATED. Ie: a failover 
event must occur and the lock version must increment.

 

To proceed, either the insync-replica gets the lock with a matching version and 
increments, or the restarted live gets the lock with matching version and 
increments. That is, both behave the same. If they get the lock and cannot 
increment the version they exit/release.

 

This means that if the most uptodate journal is not available we are 
unavailable. But we never have a journal out of sync.

For the single pair scenario, if there is a need for HA to survive two 
failures, introduce a second pair.

 

This greatly simplifies logic and state. let me try and reason via text ... (I 
think that is the test of whether it is understandable)

The lock has an auto increment version, journal updates only occur with the 
lock and record the version. Imagine every journal record having an additional 
long field.

an empty journal needs an empty lock. The first journal append will have ver=1. 
 Live replicates to Backup that stores ver=1 in its journal. The next 
incarnation of the journal (either restarted live or backup) will use ver=2 etc.

 

it begs the question, we get the lock at ver=1 and increment to ver=2 but we 
fail to record any journal record with ver=2  -> we are unavailable!

 

we need to rollback the lock version to proceed. that is the downside of having 
to coordinate but it is the only way to be safe.

 

is there any way to be smarter here?

 

> Replicated Journal quorum-based logical timestamp
> -------------------------------------------------
>
>                 Key: ARTEMIS-3340
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3340
>             Project: ActiveMQ Artemis
>          Issue Type: Improvement
>            Reporter: Francesco Nigro
>            Priority: Major
>
> Shared-nothing replication, both classic and using pluggable quorum vote, can 
> cause journal misalignment despite no split-brain events.
> Scenario without network partitions/outages:
> # Master/Primary start as live, clients connect to it
> # Backup become an in-sync replica
> # User stop live and backup failover to it
> # Backup serve clients, modifying its journal
> # User stop backup
> # User start master/primary: it become live with a journal misaligned with 
> the most up-to-date one
> The main cause of this scenario is because we allow a single broker to server 
> clients, despite configured with HA. 
> A secondary cause (for other journal misalignment cases) is that the quorum 
> service (embedded on classic, pluggable on 
> https://issues.apache.org/jira/browse/ARTEMIS-2716) 
> just take care of mutual exclusive presence of broker for the live role, 
> without considering ordering 
> For a backup broker, with no primary/master around, makes sense, but this can 
> cause bad restart/retry ordering to let a broker with a stale journal to win 
> the race to become live.
> A possible solution is to leverage on 
> https://issues.apache.org/jira/browse/ARTEMIS-2716 and store a "logical 
> timestamp" that mark the age of the journal of a broker in order to allow 
> only the one with the most up-to-date one to become a proper live.
> It means that in case of quorum service restart/outage, the admin must have 
> some command/configuration to let a broker to ignore the age of its journal 
> and just force it to start.
> In addition must be exposed some new journal CLI commands to inspect the age 
> of a broker journal, for troubleshooting reasons.
> Its very important to capture every possible event that cause the journal age 
> to change
> eg 
> # live broker send its journal file to a not yet in sync replica backup, 
> along with its "journal age"
> # backup is now ready to failover in any moment
> # a network partition happen 
> # backup try to become live for vote-retries times
> # live detect replication disconnection but is "lucky" that can reach the 
> quorum and continue serving clients
> # live increment the age of its journal
> # an outage cause live to die
> # network partition is restored
> # backup detect that journal age is no longer matching its own journal: it 
> stop trying to become live
> The key parts related to journal age/version are:
> * only who's live can change journal version (with a monotonic increment)
> * every breaking point event must cause journal age/version to change eg 
> starting as live, loosing its backup, etc etc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARTEMIS-3340) Replicated Journal quorum-based logical timestamp

Reply via email to