Apache Dev created ARTEMIS-3030:
-----------------------------------

             Summary: Journal lock evaluation fails when NFS is temporarily 
disconnected
                 Key: ARTEMIS-3030
                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3030
             Project: ActiveMQ Artemis
          Issue Type: Bug
          Components: Broker
    Affects Versions: 2.16.0
            Reporter: Apache Dev


Same scenario of ARTEMIS-2421.

If network between Live Broker (B1) and NFS Server is disconnected (for example 
rejecting its TCP packets with iptables), after the lock lease timeout this 
happens:
 * Backup server (B2) becomes Live
 * When NFS connectivity of B1 is restored, B1 remains Live

So both broker are live.

Issue seems caused by \{{java.nio.channels.FileLock#isValid}} used in 
\{{org.apache.activemq.artemis.core.server.impl.FileLockNodeManager#isLiveLockLost}},
 because it is always returning true, even if in the meanwhile the lock was 
lost and taken by B2.

Do you suggest to use specific mount options for NFS?

Or the lock evaluation should be replaced with a more reliable mechanism? We 
notice that \{{FileLock#isValid}} is returning a cached value (true), even when 
NFS connectivity is down, so it would be better to use a validation mechanism 
that forces querying the NFS server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to