[jira] [Comment Edited] (ARTEMIS-2716) Implements pluggable Quorum Vote

Francesco Nigro (Jira) Mon, 14 Jun 2021 10:41:05 -0700


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362898#comment-17362898
 ]


Francesco Nigro edited comment on ARTEMIS-2716 at 6/14/21, 5:40 PM:
--------------------------------------------------------------------

I'm going to:
 # *remove the initial loop on primary start*: a primary start should succeed 
or fail (with errors) and it's key for admin purposes. Admins are supposed to 
check broker/machine state before restarting, so it's not just an automated 
operation, but need to be supervised
 # *deprecate/document allow-failback*: allow-failback == false turn a 
failing-back primary into a backup that can just error out on failover errors.
 In the classic replication failing-back master forget its Node ID if any error 
happen on failover and restart as an empty backup. On broker restart, it got a 
different NodeID and become live.

 

The latter decision has been made to enforce what the primary role is meant to 
be: mostly a live candidate and an occasional/temporary backup, ready to 
failback ASAP.
 A failure during the failback process it's perfectly fine to fail-fast given 
that should be an all-or-nothing admin operation.

A failure during a proper failover (because backup acting as live has rejected 
the initial failback request) is still uncertain which behaviour should follow:
 * a natural-born backup just search for other lives to pair/sync with
 * a primary is probably fine to just stop, because there is no point into 
restarting as primary (and risking to become live with a misaligned journal) or 
behaving like a natural-born backup ie the mentioned above behaviour

 

This change is debatable and we can open a discussion on the PR about it.

 

 


was (Author: nigrofranz):
I'm going to:
 # *remove the initial loop on primary start*: a primary start should succeed 
or fail (with errors) and it's key for admin purposes. Admins are supposed to 
check broker/machine state before restarting, so it's not just an automated 
operation, but need to be supervised
 # *deprecate/document allow-failback*: allow-failback == false turn a 
failing-back primary into a backup that can just error out on failover errors.
 In the classic replication failing-back master forget its Node ID if any error 
happen on failover and restart as an empty backup. On broker restart, it got a 
different NodeID and become live.

 

The latter decision has been made to enforce what the primary role is meant to 
be: mostly a live candidate and an occasional/temporary backup, ready to 
failback ASAP.
A failure during the failback process it's perfectly fine to fail-fast given 
that should be an all-or-nothing admin operation.

A failure during a proper failover (because of the backup has rejected the 
initial failback request) is still uncertain which behaviour should follow:
 * a natural-born backup just search for other lives to pair/sync with
 * a primary is probably fine to just stop, because there is no point into 
restarting as primary (and risking to become live with a misaligned journal) or 
behaving like a natural-born backup ie the mentioned above behaviour

 

This change is debatable and we can open a discussion on the PR about it.

 

 

> Implements pluggable Quorum Vote
> --------------------------------
>
>                 Key: ARTEMIS-2716
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2716
>             Project: ActiveMQ Artemis
>          Issue Type: New Feature
>            Reporter: Francesco Nigro
>            Assignee: Francesco Nigro
>            Priority: Major
>         Attachments: backup.png, primary.png
>
>          Time Spent: 16h
>  Remaining Estimate: 0h
>
> This task aim to ideliver a new Quorum Vote mechanism for artemis with the 
> objectives:
> # to make it pluggable
> # to cleanly separate the election phase and the cluster member states
> # to simplify most common setups in both amount of configuration and 
> requirements (eg "witness" nodes could be implemented to support single 
> master-slave pairs)
> Post-actions to help people adopt it, but need to be thought upfront:
> # a clean upgrade path for current HA replication users
> # deprecate or integrate the current HA replication into the new version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARTEMIS-2716) Implements pluggable Quorum Vote

Reply via email to