[jira] [Comment Edited] (ARTEMIS-3030) Journal lock evaluation fails when NFS is temporarily disconnected

Apache Dev (Jira) Thu, 07 Jan 2021 04:19:26 -0800


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260465#comment-17260465
 ]


Apache Dev edited comment on ARTEMIS-3030 at 1/7/21, 12:18 PM:
---------------------------------------------------------------

I think the problem is that the other check is using channels already created 
(FileLockNodeManager#lockChannels). We saw this gave "cached" results with NFS 
disconnections.
Instead, we found that a reliable mechanism is ensuring that the channel is 
newly created during every read, for example:
E.g.:
{code}
        try (FileChannel channel = FileChannel.open(lockFile, READ, WRITE)) {
            channel.lock();
           ...
{code}

BTW, if the other check founds that status is not live, shouldn't it set 
"lostLock=true" ?

I will provide ASAP the NFS mount options we tested.
Thanks!



was (Author: apachedev):
I think the problem is that the other check is using channels already created 
(FileLockNodeManager#lockChannels). We saw this gave "cached" results with NFS 
disconnections.
Instead, we found that a reliable mechanism is ensuring that the channel is 
newly created during every read, for example:
E.g.:
{code}
        try (FileChannel channel = FileChannel.open(lockFile, READ, WRITE)) {
            channel.lock();
           ...
{code}

I will provide ASAP the NFS mount options we tested.
Thanks!


> Journal lock evaluation fails when NFS is temporarily disconnected
> ------------------------------------------------------------------
>
>                 Key: ARTEMIS-3030
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3030
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.16.0
>            Reporter: Apache Dev
>            Priority: Blocker
>
> Same scenario of ARTEMIS-2421.
> If network between Live Broker (B1) and NFS Server is disconnected (for 
> example rejecting its TCP packets with iptables), after the lock lease 
> timeout this happens:
>  * Backup server (B2) becomes Live
>  * When NFS connectivity of B1 is restored, B1 remains Live
> So both broker are live.
> Issue seems caused by \{{java.nio.channels.FileLock#isValid}} used in 
> \{{org.apache.activemq.artemis.core.server.impl.FileLockNodeManager#isLiveLockLost}},
>  because it is always returning true, even if in the meanwhile the lock was 
> lost and taken by B2.
> Do you suggest to use specific mount options for NFS?
> Or the lock evaluation should be replaced with a more reliable mechanism? We 
> notice that \{{FileLock#isValid}} is returning a cached value (true), even 
> when NFS connectivity is down, so it would be better to use a validation 
> mechanism that forces querying the NFS server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARTEMIS-3030) Journal lock evaluation fails when NFS is temporarily disconnected

Reply via email to