[ 
https://issues.apache.org/jira/browse/ARTEMIS-4143?focusedWorklogId=841776&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-841776
 ]

ASF GitHub Bot logged work on ARTEMIS-4143:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/Jan/23 16:01
            Start Date: 26/Jan/23 16:01
    Worklog Time Spent: 10m 
      Work Description: jbertram opened a new pull request, #4344:
URL: https://github.com/apache/activemq-artemis/pull/4344

   Configurations employing shared-storage with NFS are susceptible to 
split-brain in certain scenarios. For example:
   
     1) Primary loses network connection to NFS.
     2) Backup activates.
     3) Primary reconnects to NFS.
     4) Split-brain.
   
   In reality this situation is pretty unlikely due to the timing involved, but 
the possibility still exists. Currently the file lock held by the primary 
broker on the NFS share is essentially worthless in this situation. This commit 
adds logic by which the timestamp of the lock file is updated during activation 
and then routinely checked during runtime to ensure consistency. This 
effectively mitigates split-brain in this situation (and likely others). Here's 
how it works now:
   
     1) Primary loses network connection to NFS.
     2) Backup activates.
     3) Primary reconnects to NFS.
     4) Primary detects that the lock file's timestamp has been updated and
        shuts itself down.
   
   When the primary shuts down in step #4 the Topology on the backup can be 
damaged. Protections were added for this via ARTEMIS-2868 but only for the 
replicated use-case. This commit applies these protections 100% of the time so 
that the Topology remains intact.
   
   There are no tests for these changes as I cannot determine how to properly 
simulate this use-case. However, there have never been robust, automated tests 
for these kinds of NFS use-cases so this is not a departure from the norm.




Issue Time Tracking
-------------------

            Worklog Id:     (was: 841776)
    Remaining Estimate: 0h
            Time Spent: 10m

> Improve mitigation against split-brain with shared-storage
> ----------------------------------------------------------
>
>                 Key: ARTEMIS-4143
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-4143
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>            Reporter: Justin Bertram
>            Assignee: Justin Bertram
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to