[ 
https://issues.apache.org/jira/browse/ARTEMIS-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin Bertram resolved ARTEMIS-3430.
-------------------------------------
    Fix Version/s: 2.19.0
       Resolution: Fixed

> Activation Sequence Auto-Repair
> -------------------------------
>
>                 Key: ARTEMIS-3430
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3430
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>            Reporter: Francesco Nigro
>            Assignee: Francesco Nigro
>            Priority: Major
>             Fix For: 2.19.0
>
>
> This can be seen both as a bug or an improvement over the existing self-heal 
> behaviour of activation sequence introduced by 
> https://issues.apache.org/jira/browse/ARTEMIS-3340.
> In short, the existing protocol to increase activation sequence while 
> un-replicated is:
> # remote i -> -(i + 1) ie remote CLAIM 
> # local i -> (i + 1) ie local commit
> # remote -(i + 1) -> (i + 1) ie remote COMMIT
> This protocol has been designed to allow witness brokers to acknowledge if 
> their data is no longer up-to-date and to save them to throw it away if still 
> valuable, during a partial failure while increasing activation sequence.
> In the current version, self-repairing is allowed just if live broker has 
> performed 2. but not 3. ie local activation sequence is updated, but 
> coordinated one isn't committed yet.
> If the failing broker is restarted it can "fix" the coordinated sequence and 
> move on to become live again, but if 2. fail (or just never happen), the 
> coordinated activation sequence cannot be fixed if not with some admin 
> intervention, after inspecting *all* local activation sequences.
> The reason why other brokers cannot "fix" the sequence is because the local 
> sequence of the failed broker is unknown and just roll-backing the claimed 
> one (to the previous or to the right committed value) can makes the failed 
> broker to believe to have up-to-date data too, causing journal misalignments.
> The solution to this can be to fix the claimed sequence moving it to the 
> right commit value while forbidding any broker to run un-replicated using it.
> This is achieved by further increasing it *after* repaired: it would 
> prematurely age other in-sync brokers (including the failed one), but 
> allowing auto-repair without admin intervention.
> The sole drawback of this strategy is that a further fail of the repairing 
> broker while increasing sequence will give to it an exclusive responsibility 
> to auto-repair (again, on restart) because no other brokers can have an 
> high-enough local sequence.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to