[
https://issues.apache.org/jira/browse/ARTEMIS-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Justin Bertram resolved ARTEMIS-3430.
-------------------------------------
Fix Version/s: 2.19.0
Resolution: Fixed
> Activation Sequence Auto-Repair
> -------------------------------
>
> Key: ARTEMIS-3430
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3430
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Reporter: Francesco Nigro
> Assignee: Francesco Nigro
> Priority: Major
> Fix For: 2.19.0
>
>
> This can be seen both as a bug or an improvement over the existing self-heal
> behaviour of activation sequence introduced by
> https://issues.apache.org/jira/browse/ARTEMIS-3340.
> In short, the existing protocol to increase activation sequence while
> un-replicated is:
> # remote i -> -(i + 1) ie remote CLAIM
> # local i -> (i + 1) ie local commit
> # remote -(i + 1) -> (i + 1) ie remote COMMIT
> This protocol has been designed to allow witness brokers to acknowledge if
> their data is no longer up-to-date and to save them to throw it away if still
> valuable, during a partial failure while increasing activation sequence.
> In the current version, self-repairing is allowed just if live broker has
> performed 2. but not 3. ie local activation sequence is updated, but
> coordinated one isn't committed yet.
> If the failing broker is restarted it can "fix" the coordinated sequence and
> move on to become live again, but if 2. fail (or just never happen), the
> coordinated activation sequence cannot be fixed if not with some admin
> intervention, after inspecting *all* local activation sequences.
> The reason why other brokers cannot "fix" the sequence is because the local
> sequence of the failed broker is unknown and just roll-backing the claimed
> one (to the previous or to the right committed value) can makes the failed
> broker to believe to have up-to-date data too, causing journal misalignments.
> The solution to this can be to fix the claimed sequence moving it to the
> right commit value while forbidding any broker to run un-replicated using it.
> This is achieved by further increasing it *after* repaired: it would
> prematurely age other in-sync brokers (including the failed one), but
> allowing auto-repair without admin intervention.
> The sole drawback of this strategy is that a further fail of the repairing
> broker while increasing sequence will give to it an exclusive responsibility
> to auto-repair (again, on restart) because no other brokers can have an
> high-enough local sequence.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)