[ 
https://issues.apache.org/jira/browse/ARTEMIS-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602200#comment-16602200
 ] 

Tomas Hofman commented on ARTEMIS-2069:
---------------------------------------

PR: https://github.com/apache/activemq-artemis/pull/2287

> Backup doesn't activate after shared store is reconnected
> ---------------------------------------------------------
>
>                 Key: ARTEMIS-2069
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2069
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>    Affects Versions: 2.6.2
>            Reporter: Tomas Hofman
>            Priority: Major
>
> *Scenario*
>  # Start live backup server pair in dedicated topology with shared store HA, 
> with journal located on NFS
>  # NFS mounted on backup server fails
>  # Reconnect NFS on backup server
>  # Try to shut down live EAP server
>  # Backup doesn't activate
> *What happens*
>  Backup is waiting for live to fail by checking its file lock. In case the 
> connection to shared storage fails, backup logs following error.
>  
> |{color:#000000}05:50:57,896 ERROR [org.apache.activemq.artemis.core.server] 
> (AMQ119000: Activation for server 
> ActiveMQServerImpl::serverUUID=836c9b1e-f067-11e7-8763-001b21862475) 
> AMQ224000: Failure in initialisation: java.io.IOException: Input/output 
> error{color}|
> |{color:#000000} at sun.nio.ch.FileDispatcherImpl.lock0(Native Method) 
> [rt.jar:1.8.0_151]{color}|
> |{color:#000000} at 
> sun.nio.ch.FileDispatcherImpl.lock(FileDispatcherImpl.java:90) 
> [rt.jar:1.8.0_151]{color}|
> |{color:#000000} at 
> sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:1115) 
> [rt.jar:1.8.0_151]{color}|
> |{color:#000000} at 
> org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.tryLock(FileLockNodeManager.java:299)
>  [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
> |{color:#000000} at 
> org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.lock(FileLockNodeManager.java:316)
>  [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
> |{color:#000000} at 
> org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.awaitLiveNode(FileLockNodeManager.java:127)
>  [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
> |{color:#000000} at 
> org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.run(SharedStoreBackupActivation.java:77)
>  [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
> |{color:#000000} at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:2496)
>  [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
> | |
>  
> Exception is caught in {{SharedStoreBackupActivation.run}}, and causes 
> termination of backup activation process.
> In case the NFS is reconnected later, backup server doesn't continue in 
> activation process and it doesn't wait for live to fail. In case the live 
> fails, backup doesn't activate, even though it has a connection to shared 
> storage.
> Backup should retry checking live lock even in case the storage is 
> unavailable. It should log warning/error messages that storage is 
> unavailable, but it should not terminate the activation process. This would 
> allow backup to continue its duties when the storage is reconnected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to