[ 
https://issues.apache.org/jira/browse/ARTEMIS-2716?focusedWorklogId=605946&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605946
 ]

ASF GitHub Bot logged work on ARTEMIS-2716:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Jun/21 14:16
            Start Date: 03/Jun/21 14:16
    Worklog Time Spent: 10m 
      Work Description: michaelpearce-gain edited a comment on pull request 
#3555:
URL: https://github.com/apache/activemq-artemis/pull/3555#issuecomment-853899869


   So to avoid split brain is simple with distributed lock and ZK, only one 
process can hold that lock, and to be active you must have the lock, 
application with the lock should constantly be checking it still has the lock 
and if not disabling itself. You should never become active unless you get the 
lock, there should be no epoch about it, if you do that, you're asking for 
disaster.
   
   With ZK how many artemis nodes you have makes no longer any difference to 
the avoiding split, you should be able to run just 2 artemis nodes for HA  (for 
ZK yes it requires 3 minimum) but those scale differently you're not scaling 
those for data plane use case.
   
   Now the only issue is when you have full cluster stop and start, how do you 
know who has the most correct data, well that will be the instance that is 
known to have the lock last, so as long as on taking the lock, you stamp 
something in ZK with that information, then after a full stop, on full start, 
ONLY the instance that matches that stamp is allowed to take the lock 
initially. 
   
   Ideally every node registers itself in ZK with a last seen date, so in case 
that you truely lost all artemis nodes, and which ever node was active at point 
of disaster, you know which node would be the next best, as such after a 
(configurable) timeout on full cluster start, if old leader doesn't take the 
lock, the next latest seen is allowed to try take the lock.
   
   The last two paragraphs are JUST for full cluster outage, And so one you 
have cluster back you know all nodes are equal and you solely rely on the lock 
again, and in case of leader loss, any node is equal to take the lock.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 605946)
    Time Spent: 10h 10m  (was: 10h)

> Implements pluggable Quorum Vote
> --------------------------------
>
>                 Key: ARTEMIS-2716
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2716
>             Project: ActiveMQ Artemis
>          Issue Type: New Feature
>            Reporter: Francesco Nigro
>            Assignee: Francesco Nigro
>            Priority: Major
>         Attachments: backup.png, primary.png
>
>          Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> This task aim to ideliver a new Quorum Vote mechanism for artemis with the 
> objectives:
> # to make it pluggable
> # to cleanly separate the election phase and the cluster member states
> # to simplify most common setups in both amount of configuration and 
> requirements (eg "witness" nodes could be implemented to support single 
> master-slave pairs)
> Post-actions to help people adopt it, but need to be thought upfront:
> # a clean upgrade path for current HA replication users
> # deprecate or integrate the current HA replication into the new version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to