[
https://issues.apache.org/jira/browse/ARTEMIS-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260811#comment-17260811
]
Justin Bertram commented on ARTEMIS-3030:
-----------------------------------------
bq. We saw this gave "cached" results with NFS disconnections.
Without more details about your NFS mount options it's impossible to say what
the ultimate cause of this is.
bq. Instead, we found that a reliable mechanism is ensuring that the channel is
newly created during every read...
I think that's a viable strategy. Also, the code could use
{{java.nio.file.FileStore#getUsableSpace}} as that clearly avoids any NFS
client caching (as evidenced by the stack-trace in your previous comment).
bq. BTW, if the other check founds that status is not live, shouldn't it set
"lostLock=true" ?
The {{lostLock}} variable will be set to {{true}} if there is an exception
during the read.
> Journal lock evaluation fails when NFS is temporarily disconnected
> ------------------------------------------------------------------
>
> Key: ARTEMIS-3030
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3030
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Components: Broker
> Affects Versions: 2.16.0
> Reporter: Apache Dev
> Priority: Blocker
>
> Same scenario of ARTEMIS-2421.
> If network between Live Broker (B1) and NFS Server is disconnected (for
> example rejecting its TCP packets with iptables), after the lock lease
> timeout this happens:
> * Backup server (B2) becomes Live
> * When NFS connectivity of B1 is restored, B1 remains Live
> So both broker are live.
> Issue seems caused by \{{java.nio.channels.FileLock#isValid}} used in
> \{{org.apache.activemq.artemis.core.server.impl.FileLockNodeManager#isLiveLockLost}},
> because it is always returning true, even if in the meanwhile the lock was
> lost and taken by B2.
> Do you suggest to use specific mount options for NFS?
> Or the lock evaluation should be replaced with a more reliable mechanism? We
> notice that \{{FileLock#isValid}} is returning a cached value (true), even
> when NFS connectivity is down, so it would be better to use a validation
> mechanism that forces querying the NFS server.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)