[jira] [Work logged] (ARTEMIS-3429) Backup forget coordination-id after quorum loss

ASF GitHub Bot (Jira) Wed, 25 Aug 2021 04:27:08 -0700


     [ 
https://issues.apache.org/jira/browse/ARTEMIS-3429?focusedWorklogId=641619&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-641619
 ]


ASF GitHub Bot logged work on ARTEMIS-3429:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 25/Aug/21 11:26
            Start Date: 25/Aug/21 11:26
    Worklog Time Spent: 10m 
      Work Description: franz1981 commented on pull request #3694:
URL: https://github.com/apache/activemq-artemis/pull/3694#issuecomment-905414433


   I'm still waiting @gtully to come back and we can quickly review this before 
merging. CI is already good...
   The only point to decide is when to persist local replica NodeID and 
activation sequence:
   
   - I've decided to persist them right after the initial sync happen, because 
the replication process should already take care (on the replicated server) to 
await replica response before answering back to the client ie data delta from 
initial sync shouldn't diverge between live-backup while the coordinated 
activation sequence is still the same (will need @clebertsuconic opinion here 
too)
   - `classic` replication instead, persist them only if backup is stopped or 
if backup is successfully failing-over
   
   The implication of using these strategy is that after a simultaneous crash 
of both brokers and restart of just the backup:
   - the former allow it to start as live because its data is in-sync (that's 
correct)
   - the latter prevent it to start as live, because it didn't store nodeID and 
local activation sequence, hence it's still appear as an empty backup
   
   To me, using the former strategy increase HA in case of crashes, assuming 
that the replication process is correctly syncing data (dealing correctly with 
delta/in-flight changes after the initial sync).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

            Worklog Id:     (was: 641619)
    Remaining Estimate: 0h
            Time Spent: 10m

> Backup forget coordination-id after quorum loss
> -----------------------------------------------
>
>                 Key: ARTEMIS-3429
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3429
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>    Affects Versions: 2.18.0
>            Reporter: Francesco Nigro
>            Assignee: Francesco Nigro
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Assuming a multi-primary set-up, if the broker acting as backup lost quorum, 
> is restarted without applying the coordination-id patching on NodeManager:  
> if its local activation sequence is > 0 (because of a past sync with the 
> other live) the backup succeed to activate, causing a split-brain (although 
> its NodeID is a random one vs the original coordination-id).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (ARTEMIS-3429) Backup forget coordination-id after quorum loss

Reply via email to