[jira] [Commented] (ARTEMIS-2069) Backup doesn't activate after shared store is reconnected

ASF GitHub Bot (JIRA) Mon, 03 Sep 2018 06:59:22 -0700


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602198#comment-16602198
 ]


ASF GitHub Bot commented on ARTEMIS-2069:
-----------------------------------------

GitHub user TomasHofman opened a pull request:

    https://github.com/apache/activemq-artemis/pull/2287

    ARTEMIS-2069 Backup doesn't activate after shared store is reconnected

    https://issues.apache.org/jira/browse/ARTEMIS-2069
    https://issues.jboss.org/browse/WFLY-10968
    https://issues.jboss.org/browse/JBEAP-15343
    
    Fix tries to prevent a server activation thread from terminating when 
FileLockNodeManager.tryLock() throws an IOException, e.g. because temporarily 
inaccessible NFS directory.
    The node manager will repeat tryLock() call every two seconds.
    WARN message with stack trace will be printed on first failure, DEBUG 
messages will be printed on recurring failures.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/TomasHofman/activemq-artemis 
JBEAP-14032-master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/activemq-artemis/pull/2287.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2287
    
----
commit 162e71816e54137534c0bc8b2c3d6c85f941917d
Author: Tomas Hofman <thofman@...>
Date:   2018-09-03T13:47:03Z

    ARTEMIS-2069 Backup doesn't activate after shared store is reconnected

----


> Backup doesn't activate after shared store is reconnected
> ---------------------------------------------------------
>
>                 Key: ARTEMIS-2069
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2069
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>    Affects Versions: 2.6.2
>            Reporter: Tomas Hofman
>            Priority: Major
>
> *Scenario*
>  # Start live backup server pair in dedicated topology with shared store HA, 
> with journal located on NFS
>  # NFS mounted on backup server fails
>  # Reconnect NFS on backup server
>  # Try to shut down live EAP server
>  # Backup doesn't activate
> *What happens*
>  Backup is waiting for live to fail by checking its file lock. In case the 
> connection to shared storage fails, backup logs following error.
>  
> |{color:#000000}05:50:57,896 ERROR [org.apache.activemq.artemis.core.server] 
> (AMQ119000: Activation for server 
> ActiveMQServerImpl::serverUUID=836c9b1e-f067-11e7-8763-001b21862475) 
> AMQ224000: Failure in initialisation: java.io.IOException: Input/output 
> error{color}|
> |{color:#000000} at sun.nio.ch.FileDispatcherImpl.lock0(Native Method) 
> [rt.jar:1.8.0_151]{color}|
> |{color:#000000} at 
> sun.nio.ch.FileDispatcherImpl.lock(FileDispatcherImpl.java:90) 
> [rt.jar:1.8.0_151]{color}|
> |{color:#000000} at 
> sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:1115) 
> [rt.jar:1.8.0_151]{color}|
> |{color:#000000} at 
> org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.tryLock(FileLockNodeManager.java:299)
>  [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
> |{color:#000000} at 
> org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.lock(FileLockNodeManager.java:316)
>  [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
> |{color:#000000} at 
> org.apache.activemq.artemis.core.server.impl.FileLockNodeManager.awaitLiveNode(FileLockNodeManager.java:127)
>  [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
> |{color:#000000} at 
> org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.run(SharedStoreBackupActivation.java:77)
>  [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
> |{color:#000000} at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:2496)
>  [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]{color}|
> | |
>  
> Exception is caught in {{SharedStoreBackupActivation.run}}, and causes 
> termination of backup activation process.
> In case the NFS is reconnected later, backup server doesn't continue in 
> activation process and it doesn't wait for live to fail. In case the live 
> fails, backup doesn't activate, even though it has a connection to shared 
> storage.
> Backup should retry checking live lock even in case the storage is 
> unavailable. It should log warning/error messages that storage is 
> unavailable, but it should not terminate the activation process. This would 
> allow backup to continue its duties when the storage is reconnected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARTEMIS-2069) Backup doesn't activate after shared store is reconnected

Reply via email to