Suman Moorthy created ARTEMIS-3076:
--------------------------------------

             Summary: Artemis Master node not starting after failover to Slave
                 Key: ARTEMIS-3076
                 URL: https://issues.apache.org/jira/browse/ARTEMIS-3076
             Project: ActiveMQ Artemis
          Issue Type: Bug
    Affects Versions: 2.11.0
            Reporter: Suman Moorthy


I have an Artemis (version 2.11.0) HA configured (Master and Slave).

Master node goes down for unknown reason, the below log get printed 
continuously.
{code:java}
2021-01-15 23:02:05,414 WARN  [org.apache.activemq.artemis.core.server] 
AMQ222154: Error checking DLQ: 
ActiveMQShutdownException[errorType=SHUTDOWN_ERROR message=Journal must be in 
state=LOADED, was [STOPPED]]
2021-01-15 23:02:05,414 WARN  [org.apache.activemq.artemis.core.server] 
AMQ222154: Error checking DLQ: 
ActiveMQShutdownException[errorType=SHUTDOWN_ERROR message=Journal must be in 
state=LOADED, was [STOPPED]] 
at 
org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkJournalIsLoaded(JournalImpl.java:1087)
 [artemis-journal-2.11.0.jar:2.11.0] 
at 
org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendUpdateRecord(JournalImpl.java:886)
 [artemis-journal-2.11.0.jar:2.11.0] 
at 
org.apache.activemq.artemis.core.journal.Journal.appendUpdateRecord(Journal.java:98)
 [artemis-journal-2.11.0.jar:2.11.0] 
at 
org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.updateDeliveryCount(AbstractJournalStorageManager.java:756)
 [artemis-server-2.11.0.jar:2.11.0] 
at 
org.apache.activemq.artemis.core.server.impl.QueueImpl.checkRedelivery(QueueImpl.java:3052)
 [artemis-server-2.11.0.jar:2.11.0] 
at 
org.apache.activemq.artemis.core.server.impl.RefsOperation.rollbackRedelivery(RefsOperation.java:166)
 [artemis-server-2.11.0.jar:2.11.0] 
at 
org.apache.activemq.artemis.core.server.impl.RefsOperation.afterRollback(RefsOperation.java:113)
 [artemis-server-2.11.0.jar:2.11.0] 
at 
org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.afterRollback(TransactionImpl.java:589)
 [artemis-server-2.11.0.jar:2.11.0] 
at 
org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.access$200(TransactionImpl.java:40)
 [artemis-server-2.11.0.jar:2.11.0] 
at 
org.apache.activemq.artemis.core.transaction.impl.TransactionImpl$4.done(TransactionImpl.java:442)
 [artemis-server-2.11.0.jar:2.11.0] 
at 
org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1.run(OperationContextImpl.java:244)
 [artemis-server-2.11.0.jar:2.11.0] 
at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
 [artemis-commons-2.11.0.jar:2.11.0] 
at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
 [artemis-commons-2.11.0.jar:2.11.0] 
at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66)
 [artemis-commons-2.11.0.jar:2.11.0] 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[rt.jar:1.8.0_275] 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[rt.jar:1.8.0_275] 
at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
 [artemis-commons-2.11.0.jar:2.11.0]
{code}
 

The Slave comes up as expected, but throws an NPE:
{noformat}

2021-01-15 23:02:27,529 INFO  [org.apache.activemq.artemis.core.server] 
AMQ221010: Backup Server is now live
2021-01-15 23:02:27,545 ERROR [org.apache.activemq.artemis.core.server] 
AMQ224000: Failure in initialisation: java.lang.NullPointerException 
at 
org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation$FailbackChecker.<init>(SharedStoreBackupActivation.java:193)
 [artemis-server-2.11.0.jar:2.11.0] 
at 
org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.startFailbackChecker(SharedStoreBackupActivation.java:185)
 [artemis-server-2.11.0.jar:2.11.0] 
at 
org.apache.activemq.artemis.core.server.impl.SharedStoreBackupActivation.run(SharedStoreBackupActivation.java:118)
 [artemis-server-2.11.0.jar:2.11.0] 
at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:3863)
 [artemis-server-2.11.0.jar:2.11.0]{noformat}
Master attempts to start but, it doesn't progress beyond *"AMQ221034: Waiting 
indefinitely to obtain live lock"*
The logs are stuck at this point even after multiple restarts.
{noformat}
2021-01-15 23:03:56,238 INFO  [org.apache.activemq.artemis.core.server] 
AMQ221006: Waiting to obtain live lock
2021-01-15 23:03:56,300 INFO  [org.apache.activemq.artemis.core.server] 
AMQ221013: Using NIO Journal
2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] 
AMQ221043: Protocol module found: [artemis-server]. Adding protocol support 
for: CORE
2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] 
AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol 
support for: AMQP
2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] 
AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol 
support for: HORNETQ
2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] 
AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol 
support for: MQTT
2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] 
AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding protocol 
support for: OPENWIRE
2021-01-15 23:03:56,581 INFO  [org.apache.activemq.artemis.core.server] 
AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol 
support for: STOMP
2021-01-15 23:03:56,644 WARN  [org.apache.activemq.artemis.core.server] 
AMQ222035: Directory \\test\data\paging\cd776bae-1a55-11eb-985d-0050569136c8 
did not have an identification file address.txt
2021-01-15 23:03:56,644 WARN  [org.apache.activemq.artemis.core.server] 
AMQ222035: Directory \\test\data\paging\a84f1e4f-1f1a-11eb-a37f-0050569136c8 
did not have an identification file address.txt
2021-01-15 23:03:56,644 WARN  [org.apache.activemq.artemis.core.server] 
AMQ222035: Directory \\test\data\paging\a87edff5-1f1a-11eb-a37f-0050569136c8 
did not have an identification file address.txt
2021-01-15 23:03:56,988 INFO  [org.apache.activemq.artemis.core.server] 
AMQ221034: Waiting indefinitely to obtain live lock{noformat}
 

Can you please advise on the issue here and the steps to recover?

Does NPE in Slave start-up have any effects on the queue/functioning?

Do I need to stop the Slave manually to get the Master to start successfully?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to