jbertram opened a new pull request, #4344:
URL: https://github.com/apache/activemq-artemis/pull/4344

   Configurations employing shared-storage with NFS are susceptible to 
split-brain in certain scenarios. For example:
   
     1) Primary loses network connection to NFS.
     2) Backup activates.
     3) Primary reconnects to NFS.
     4) Split-brain.
   
   In reality this situation is pretty unlikely due to the timing involved, but 
the possibility still exists. Currently the file lock held by the primary 
broker on the NFS share is essentially worthless in this situation. This commit 
adds logic by which the timestamp of the lock file is updated during activation 
and then routinely checked during runtime to ensure consistency. This 
effectively mitigates split-brain in this situation (and likely others). Here's 
how it works now:
   
     1) Primary loses network connection to NFS.
     2) Backup activates.
     3) Primary reconnects to NFS.
     4) Primary detects that the lock file's timestamp has been updated and
        shuts itself down.
   
   When the primary shuts down in step #4 the Topology on the backup can be 
damaged. Protections were added for this via ARTEMIS-2868 but only for the 
replicated use-case. This commit applies these protections 100% of the time so 
that the Topology remains intact.
   
   There are no tests for these changes as I cannot determine how to properly 
simulate this use-case. However, there have never been robust, automated tests 
for these kinds of NFS use-cases so this is not a departure from the norm.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to