Tomas Hofman created ARTEMIS-2069:
-------------------------------------

             Summary: Backup doesn't activate after shared store is reconnected
                 Key: ARTEMIS-2069
                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2069
             Project: ActiveMQ Artemis
          Issue Type: Bug
    Affects Versions: 2.6.2
            Reporter: Tomas Hofman


*Scenario*
 # Start live backup server pair in dedicated topology with shared store HA, 
with journal located on NFS
 # NFS mounted on backup server fails
 # Reconnect NFS on backup server
 # Try to shut down live EAP server
 # Backup doesn't activate

*What happens*
 Backup is waiting for live to fail by checking its file lock. In case the 
connection to shared storage fails, backup logs following error.

 
|{color:#000000}05:50:57,896 ERROR [org.apache.activemq.artemis.core.server] 
(AMQ119000: Activation for server 
ActiveMQServerImpl::serverUUID=836c9b1e-f067-11e7-8763-001b21862475) AMQ224000: 
Failure in initialisation: java.io.IOException: Input/output error{color}|
|{color:#000000} at sun.nio.ch.FileDispatcherImpl.lock0(Native Method) 
[rt.jar:1.8.0_151]{color}|
|{color:#000000} at 
sun.nio.ch.FileDispatcherImpl.lock(FileDispatcherImpl.java:90) 
[rt.jar:1.8.0_151]{color}|
|{color:#000000} at 
sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:1115) 
[rt.jar:1.8.0_151]{color}|
|{color:#000000} at 
org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.tryLock(FileLockNodeManager.java:299)
 [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
|{color:#000000} at 
org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.lock(FileLockNodeManager.java:316)
 [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
|{color:#000000} at 
org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.awaitLiveNode(FileLockNodeManager.java:127)
 [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
|{color:#000000} at 
org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.run(SharedStoreBackupActivation.java:77)
 [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
|{color:#000000} at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:2496)
 [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
| |

 

Exception is caught in {{SharedStoreBackupActivation.run}}, and causes 
termination of backup activation process.

In case the NFS is reconnected later, backup server doesn't continue in 
activation process and it doesn't wait for live to fail. In case the live 
fails, backup doesn't activate, even though it has a connection to shared 
storage.

Backup should retry checking live lock even in case the storage is unavailable. 
It should log warning/error messages that storage is unavailable, but it should 
not terminate the activation process. This would allow backup to continue its 
duties when the storage is reconnected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to