[
https://issues.apache.org/jira/browse/ARTEMIS-2716?focusedWorklogId=605945&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-605945
]
ASF GitHub Bot logged work on ARTEMIS-2716:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 03/Jun/21 14:15
Start Date: 03/Jun/21 14:15
Worklog Time Spent: 10m
Work Description: michaelpearce-gain edited a comment on pull request
#3555:
URL: https://github.com/apache/activemq-artemis/pull/3555#issuecomment-853899869
So to avoid split brain is simple with distributed lock and ZK, only one
process can hold that lock, and to be active you must have the lock,
application with the lock should constantly be checking it still has the lock
and if not disabling itself. You should never become active unless you get the
lock, there should be no epoch about it, if you do that, you're asking for
disaster.
With ZK how many artemis nodes you have makes no longer any difference to
the avoiding split, you should be able to run just 2 artemis nodes for HA (for
ZK yes it requires 3 minimum) but those scale differently you're not scaling
those for data plane use case.
Now the only issue is when you have full cluster stop and start, how do you
know who has the most correct data, well that will be the instance that is
known to have the lock last, so as long as on taking the lock, you stamp
something in ZK with that information, then after a full stop, on full start,
ONLY the instance that matches that stamp is allowed to take the lock
initially.
Ideally every node registers itself in ZK with a last seen date, so in case
that you truely lost all artemis nodes, and which ever node was active at point
of disaster, you know which node would be the next best, as such after a
(configurable) timeout on full cluster start, if old leader doesn't take the
lock, the next latest seen is allowed to try take the lock.
Once you have cluster back you know all nodes are equal and you solely rely
on the lock again, and in case of leader loss, any node is equal to take the
lock.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 605945)
Time Spent: 10h (was: 9h 50m)
> Implements pluggable Quorum Vote
> --------------------------------
>
> Key: ARTEMIS-2716
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2716
> Project: ActiveMQ Artemis
> Issue Type: New Feature
> Reporter: Francesco Nigro
> Assignee: Francesco Nigro
> Priority: Major
> Attachments: backup.png, primary.png
>
> Time Spent: 10h
> Remaining Estimate: 0h
>
> This task aim to ideliver a new Quorum Vote mechanism for artemis with the
> objectives:
> # to make it pluggable
> # to cleanly separate the election phase and the cluster member states
> # to simplify most common setups in both amount of configuration and
> requirements (eg "witness" nodes could be implemented to support single
> master-slave pairs)
> Post-actions to help people adopt it, but need to be thought upfront:
> # a clean upgrade path for current HA replication users
> # deprecate or integrate the current HA replication into the new version
--
This message was sent by Atlassian Jira
(v8.3.4#803005)