[jira] [Work logged] (ARTEMIS-2716) Implements pluggable Quorum Vote

ASF GitHub Bot (Jira) Thu, 03 Jun 2021 06:24:09 -0700


     [ 
https://issues.apache.org/jira/browse/ARTEMIS-2716?focusedWorklogId=605910&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605910
 ]


ASF GitHub Bot logged work on ARTEMIS-2716:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Jun/21 13:23
            Start Date: 03/Jun/21 13:23
    Worklog Time Spent: 10m 
      Work Description: gtully commented on pull request #3555:
URL: https://github.com/apache/activemq-artemis/pull/3555#issuecomment-853865256


   sorry, I need to give some more context, dropping failback b/c it is no 
longer necessary is what I am thinking and I am considering/assuming some 
additional state from the lock. 
   
   Let me elaborate and see if this makes sense/holds water!
   
   The bottom line here is: we must avoid unilateral decisions b/c our journal 
cannot reconcile concurrent writes on different nodes. ie: we cannot deal with 
split brain, so we must avoid it.
   With a simple distributed lock, there is an implicit timeout or epoch. If we 
track the current epoch we can coordinate our local state transitions and avoid 
unilateral decisions.
   
   in a simpler state model, a broker transitions from:
    BROKER - PRIMARY - REPLICATED
   or:
    BROKER - BACKUP - IN_SYNC_REPLICA
   
   On restart:
   For the next 'in sequence' epoch the REPLICATED and IN_SYNC_REPLICA are 
identical, either can become PRIMARY and race. The winner becomes PRIMARY and 
the loser BACKUP.
   In this case, delaying the exit of one can replicate or force 'failback'.
   
   The more interesting case, if an epoch is skipped, our local state is stale, 
revert to BROKER and exit in the hope/knowledge/expectation that PRIMARY will 
restart. PRIMARY is no longer REPLICATED and the IN_SYNC_REPLICA is stale. 
PRIMARY must restart.
   
   Here is the crux, we trade off availability for consistency. The only 
failover that can happen is from REPLICATED to an IN_SYNC_REPLICA


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 605910)
    Time Spent: 9.5h  (was: 9h 20m)

> Implements pluggable Quorum Vote
> --------------------------------
>
>                 Key: ARTEMIS-2716
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2716
>             Project: ActiveMQ Artemis
>          Issue Type: New Feature
>            Reporter: Francesco Nigro
>            Assignee: Francesco Nigro
>            Priority: Major
>         Attachments: backup.png, primary.png
>
>          Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> This task aim to ideliver a new Quorum Vote mechanism for artemis with the 
> objectives:
> # to make it pluggable
> # to cleanly separate the election phase and the cluster member states
> # to simplify most common setups in both amount of configuration and 
> requirements (eg "witness" nodes could be implemented to support single 
> master-slave pairs)
> Post-actions to help people adopt it, but need to be thought upfront:
> # a clean upgrade path for current HA replication users
> # deprecate or integrate the current HA replication into the new version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-2716) Implements pluggable Quorum Vote

Reply via email to