[
https://issues.apache.org/jira/browse/ARTEMIS-4143?focusedWorklogId=841776&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-841776
]
ASF GitHub Bot logged work on ARTEMIS-4143:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 26/Jan/23 16:01
Start Date: 26/Jan/23 16:01
Worklog Time Spent: 10m
Work Description: jbertram opened a new pull request, #4344:
URL: https://github.com/apache/activemq-artemis/pull/4344
Configurations employing shared-storage with NFS are susceptible to
split-brain in certain scenarios. For example:
1) Primary loses network connection to NFS.
2) Backup activates.
3) Primary reconnects to NFS.
4) Split-brain.
In reality this situation is pretty unlikely due to the timing involved, but
the possibility still exists. Currently the file lock held by the primary
broker on the NFS share is essentially worthless in this situation. This commit
adds logic by which the timestamp of the lock file is updated during activation
and then routinely checked during runtime to ensure consistency. This
effectively mitigates split-brain in this situation (and likely others). Here's
how it works now:
1) Primary loses network connection to NFS.
2) Backup activates.
3) Primary reconnects to NFS.
4) Primary detects that the lock file's timestamp has been updated and
shuts itself down.
When the primary shuts down in step #4 the Topology on the backup can be
damaged. Protections were added for this via ARTEMIS-2868 but only for the
replicated use-case. This commit applies these protections 100% of the time so
that the Topology remains intact.
There are no tests for these changes as I cannot determine how to properly
simulate this use-case. However, there have never been robust, automated tests
for these kinds of NFS use-cases so this is not a departure from the norm.
Issue Time Tracking
-------------------
Worklog Id: (was: 841776)
Remaining Estimate: 0h
Time Spent: 10m
> Improve mitigation against split-brain with shared-storage
> ----------------------------------------------------------
>
> Key: ARTEMIS-4143
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4143
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Reporter: Justin Bertram
> Assignee: Justin Bertram
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)