[
https://issues.apache.org/jira/browse/ARTEMIS-3429?focusedWorklogId=641619&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641619
]
ASF GitHub Bot logged work on ARTEMIS-3429:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 25/Aug/21 11:26
Start Date: 25/Aug/21 11:26
Worklog Time Spent: 10m
Work Description: franz1981 commented on pull request #3694:
URL: https://github.com/apache/activemq-artemis/pull/3694#issuecomment-905414433
I'm still waiting @gtully to come back and we can quickly review this before
merging. CI is already good...
The only point to decide is when to persist local replica NodeID and
activation sequence:
- I've decided to persist them right after the initial sync happen, because
the replication process should already take care (on the replicated server) to
await replica response before answering back to the client ie data delta from
initial sync shouldn't diverge between live-backup while the coordinated
activation sequence is still the same (will need @clebertsuconic opinion here
too)
- `classic` replication instead, persist them only if backup is stopped or
if backup is successfully failing-over
The implication of using these strategy is that after a simultaneous crash
of both brokers and restart of just the backup:
- the former allow it to start as live because its data is in-sync (that's
correct)
- the latter prevent it to start as live, because it didn't store nodeID and
local activation sequence, hence it's still appear as an empty backup
To me, using the former strategy increase HA in case of crashes, assuming
that the replication process is correctly syncing data (dealing correctly with
delta/in-flight changes after the initial sync).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 641619)
Remaining Estimate: 0h
Time Spent: 10m
> Backup forget coordination-id after quorum loss
> -----------------------------------------------
>
> Key: ARTEMIS-3429
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3429
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Affects Versions: 2.18.0
> Reporter: Francesco Nigro
> Assignee: Francesco Nigro
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Assuming a multi-primary set-up, if the broker acting as backup lost quorum,
> is restarted without applying the coordination-id patching on NodeManager:
> if its local activation sequence is > 0 (because of a past sync with the
> other live) the backup succeed to activate, causing a split-brain (although
> its NodeID is a random one vs the original coordination-id).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)