[
https://issues.apache.org/jira/browse/ARTEMIS-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859941#comment-16859941
]
ASF subversion and git services commented on ARTEMIS-2069:
----------------------------------------------------------
Commit 097ef281fd987579c9a92a50fb1906729b07d5f5 in activemq-artemis's branch
refs/heads/master from Tomas Hofman
[ https://gitbox.apache.org/repos/asf?p=activemq-artemis.git;h=097ef28 ]
ARTEMIS-2069 Backup doesn't activate after shared store is reconnected
> Backup doesn't activate after shared store is reconnected
> ---------------------------------------------------------
>
> Key: ARTEMIS-2069
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2069
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Affects Versions: 2.6.2
> Reporter: Tomas Hofman
> Priority: Major
> Time Spent: 3h
> Remaining Estimate: 0h
>
> *Scenario*
> # Start live backup server pair in dedicated topology with shared store HA,
> with journal located on NFS
> # NFS mounted on backup server fails
> # Reconnect NFS on backup server
> # Try to shut down live EAP server
> # Backup doesn't activate
> *What happens*
> Backup is waiting for live to fail by checking its file lock. In case the
> connection to shared storage fails, backup logs following error.
>
> |{color:#000000}05:50:57,896 ERROR [org.apache.activemq.artemis.core.server]
> (AMQ119000: Activation for server
> ActiveMQServerImpl::serverUUID=836c9b1e-f067-11e7-8763-001b21862475)
> AMQ224000: Failure in initialisation: java.io.IOException: Input/output
> error{color}|
> |{color:#000000} at sun.nio.ch.FileDispatcherImpl.lock0(Native Method)
> [rt.jar:1.8.0_151]{color}|
> |{color:#000000} at
> sun.nio.ch.FileDispatcherImpl.lock(FileDispatcherImpl.java:90)
> [rt.jar:1.8.0_151]{color}|
> |{color:#000000} at
> sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:1115)
> [rt.jar:1.8.0_151]{color}|
> |{color:#000000} at
> org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.tryLock(FileLockNodeManager.java:299)
> [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
> |{color:#000000} at
> org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.lock(FileLockNodeManager.java:316)
> [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
> |{color:#000000} at
> org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.awaitLiveNode(FileLockNodeManager.java:127)
> [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
> |{color:#000000} at
> org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.run(SharedStoreBackupActivation.java:77)
> [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
> |{color:#000000} at
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:2496)
> [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
> | |
>
> Exception is caught in {{SharedStoreBackupActivation.run}}, and causes
> termination of backup activation process.
> In case the NFS is reconnected later, backup server doesn't continue in
> activation process and it doesn't wait for live to fail. In case the live
> fails, backup doesn't activate, even though it has a connection to shared
> storage.
> Backup should retry checking live lock even in case the storage is
> unavailable. It should log warning/error messages that storage is
> unavailable, but it should not terminate the activation process. This would
> allow backup to continue its duties when the storage is reconnected.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)