jbertram opened a new pull request, #4344:
URL: https://github.com/apache/activemq-artemis/pull/4344
Configurations employing shared-storage with NFS are susceptible to
split-brain in certain scenarios. For example:
1) Primary loses network connection to NFS.
2) Backup activates.
3) Primary reconnects to NFS.
4) Split-brain.
In reality this situation is pretty unlikely due to the timing involved, but
the possibility still exists. Currently the file lock held by the primary
broker on the NFS share is essentially worthless in this situation. This commit
adds logic by which the timestamp of the lock file is updated during activation
and then routinely checked during runtime to ensure consistency. This
effectively mitigates split-brain in this situation (and likely others). Here's
how it works now:
1) Primary loses network connection to NFS.
2) Backup activates.
3) Primary reconnects to NFS.
4) Primary detects that the lock file's timestamp has been updated and
shuts itself down.
When the primary shuts down in step #4 the Topology on the backup can be
damaged. Protections were added for this via ARTEMIS-2868 but only for the
replicated use-case. This commit applies these protections 100% of the time so
that the Topology remains intact.
There are no tests for these changes as I cannot determine how to properly
simulate this use-case. However, there have never been robust, automated tests
for these kinds of NFS use-cases so this is not a departure from the norm.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]