[
https://issues.apache.org/jira/browse/ARTEMIS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexander updated ARTEMIS-4114:
-------------------------------
Description:
Broker deadlock occurs when restarting another broker in the cluster.
When one of the cluster brokers is restarted (cluster of 4 brokers), we get a
restart of another broker.
Brokers are connected via staticConnectors, scaleDown policy is also configured:
{code:java}
<ha-policy>
<live-only>
<scale-down>
<connectors>
<connector-ref>ART.EL.CLS1-connector</connector-ref>
<connector-ref>ART.EL.CLS2-connector</connector-ref>
<connector-ref>ART.EL.CLS3-connector</connector-ref>
</connectors>
</scale-down>
</live-only>
</ha-policy>
{code}
Logs of fallen broker:
{code:java}
Deadlock detected!
"Thread-16
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@46cc127b)"
Id=82 BLOCKED on
org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionBridge@62661d03
owned by "Thread-142 (ActiveMQ-client-global-threads)" Id=10066
at
org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.handle(BridgeImpl.java:620)
- blocked on
org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionBridge@62661d03
at
org.apache.activemq.artemis.core.server.impl.QueueImpl.handle(QueueImpl.java:3897)
- locked org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573
at
org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:3061)
- locked org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573
at
org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:4205)
at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
at
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
at
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$134/0x00000008002b5840.run(Unknown
Source)
at
[email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
[email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
Number of locked synchronizers = 2
- java.util.concurrent.ThreadPoolExecutor$Worker@ffceecd
- java.util.concurrent.locks.ReentrantLock$NonfairSync@561fd6c1
"Thread-142 (ActiveMQ-client-global-threads)" Id=10066 BLOCKED on
org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573 owned by
"Thread-16
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@46cc127b)"
Id=82
at
org.apache.activemq.artemis.core.server.impl.QueueImpl.iterQueue(QueueImpl.java:2158)
- blocked on
org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573
at
org.apache.activemq.artemis.core.server.impl.QueueImpl.moveReferencesBetweenSnFQueues(QueueImpl.java:2649)
at
org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.scaleDown(BridgeImpl.java:746)
- locked
org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionBridge@62661d03
at
org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.connectionFailed(BridgeImpl.java:728)
at
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.callSessionFailureListeners(ClientSessionFactoryImpl.java:774)
at
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:709)
at
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:544)
at
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.access$600(ClientSessionFactoryImpl.java:75)
at
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$DelegatingFailureListener.connectionFailed(ClientSessionFactoryImpl.java:1317)
at
org.apache.activemq.artemis.spi.core.protocol.AbstractRemotingConnection.callFailureListeners(AbstractRemotingConnection.java:78)
at
org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:222)
at
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$CloseRunnable.run(ClientSessionFactoryImpl.java:1091)
at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
at
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
at
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$134/0x00000008002b5840.run(Unknown
Source)
at
[email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
[email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
Number of locked synchronizers = 3
- java.util.concurrent.ThreadPoolExecutor$Worker@21768e7
- java.util.concurrent.locks.ReentrantLock$NonfairSync@32848485
- java.util.concurrent.locks.ReentrantLock$NonfairSync@6efeeadb {code}
In attachments, added logs of a restarting broker and logs of a falling broker.
The broker fell two minutes after the restart.
was:
Broker deadlock occurs when restarting another broker in the cluster.
When one of the cluster brokers is restarted (cluster of 4 brokers), we get a
restart of another broker.
Brokers are connected via staticConnectors, scaleDown policy is also configured:
{code:java}
<ha-policy>
<live-only>
<scale-down>
<connectors>
<connector-ref>ART.EL.CLS1-connector</connector-ref>
<connector-ref>ART.EL.CLS2-connector</connector-ref>
<connector-ref>ART.EL.CLS3-connector</connector-ref>
</connectors>
</scale-down>
</live-only>
</ha-policy>
{code}
Logs of fallen broker:
{code:java}
Deadlock detected!"Thread-16
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@46cc127b)"
Id=82 BLOCKED on
org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionBridge@62661d03
owned by "Thread-142 (ActiveMQ-client-global-threads)" Id=10066
at
org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.handle(BridgeImpl.java:620)
- blocked on
org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionBridge@62661d03
at
org.apache.activemq.artemis.core.server.impl.QueueImpl.handle(QueueImpl.java:3897)
- locked org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573
at
org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:3061)
- locked org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573
at
org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:4205)
at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
at
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
at
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$134/0x00000008002b5840.run(Unknown
Source)
at
[email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
[email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
Number of locked synchronizers = 2
- java.util.concurrent.ThreadPoolExecutor$Worker@ffceecd
- java.util.concurrent.locks.ReentrantLock$NonfairSync@561fd6c1
"Thread-142 (ActiveMQ-client-global-threads)" Id=10066 BLOCKED on
org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573 owned by
"Thread-16
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@46cc127b)"
Id=82
at
org.apache.activemq.artemis.core.server.impl.QueueImpl.iterQueue(QueueImpl.java:2158)
- blocked on
org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573
at
org.apache.activemq.artemis.core.server.impl.QueueImpl.moveReferencesBetweenSnFQueues(QueueImpl.java:2649)
at
org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.scaleDown(BridgeImpl.java:746)
- locked
org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionBridge@62661d03
at
org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.connectionFailed(BridgeImpl.java:728)
at
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.callSessionFailureListeners(ClientSessionFactoryImpl.java:774)
at
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:709)
at
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:544)
at
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.access$600(ClientSessionFactoryImpl.java:75)
at
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$DelegatingFailureListener.connectionFailed(ClientSessionFactoryImpl.java:1317)
at
org.apache.activemq.artemis.spi.core.protocol.AbstractRemotingConnection.callFailureListeners(AbstractRemotingConnection.java:78)
at
org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:222)
at
org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$CloseRunnable.run(ClientSessionFactoryImpl.java:1091)
at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
at
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
at
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$134/0x00000008002b5840.run(Unknown
Source)
at
[email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
[email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
Number of locked synchronizers = 3
- java.util.concurrent.ThreadPoolExecutor$Worker@21768e7
- java.util.concurrent.locks.ReentrantLock$NonfairSync@32848485
- java.util.concurrent.locks.ReentrantLock$NonfairSync@6efeeadb {code}
In attachments, added logs of a restarting broker and logs of a falling broker.
The broker fell two minutes after the restart.
> Broker deadlock occurs when restarting another broker in the cluster
> --------------------------------------------------------------------
>
> Key: ARTEMIS-4114
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4114
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Components: Broker
> Affects Versions: 2.19.1
> Reporter: Alexander
> Priority: Critical
> Attachments: fallen_broker_logs.txt, restarted_broker_logs.txt
>
>
> Broker deadlock occurs when restarting another broker in the cluster.
> When one of the cluster brokers is restarted (cluster of 4 brokers), we get a
> restart of another broker.
> Brokers are connected via staticConnectors, scaleDown policy is also
> configured:
>
> {code:java}
> <ha-policy>
> <live-only>
> <scale-down>
> <connectors>
> <connector-ref>ART.EL.CLS1-connector</connector-ref>
> <connector-ref>ART.EL.CLS2-connector</connector-ref>
> <connector-ref>ART.EL.CLS3-connector</connector-ref>
> </connectors>
> </scale-down>
> </live-only>
> </ha-policy>
> {code}
>
> Logs of fallen broker:
>
> {code:java}
> Deadlock detected!
> "Thread-16
> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@46cc127b)"
> Id=82 BLOCKED on
> org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionBridge@62661d03
> owned by "Thread-142 (ActiveMQ-client-global-threads)" Id=10066
> at
> org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.handle(BridgeImpl.java:620)
> - blocked on
> org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionBridge@62661d03
> at
> org.apache.activemq.artemis.core.server.impl.QueueImpl.handle(QueueImpl.java:3897)
> - locked org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573
> at
> org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:3061)
> - locked org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573
> at
> org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:4205)
> at
> org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
> at
> org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
> at
> org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
> at
> org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$134/0x00000008002b5840.run(Unknown
> Source)
> at
> [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at
> org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
>
> Number of locked synchronizers = 2
> - java.util.concurrent.ThreadPoolExecutor$Worker@ffceecd
> - java.util.concurrent.locks.ReentrantLock$NonfairSync@561fd6c1
> "Thread-142 (ActiveMQ-client-global-threads)" Id=10066 BLOCKED on
> org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573 owned by
> "Thread-16
> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@46cc127b)"
> Id=82
> at
> org.apache.activemq.artemis.core.server.impl.QueueImpl.iterQueue(QueueImpl.java:2158)
> - blocked on
> org.apache.activemq.artemis.core.server.impl.QueueImpl@59041573
> at
> org.apache.activemq.artemis.core.server.impl.QueueImpl.moveReferencesBetweenSnFQueues(QueueImpl.java:2649)
> at
> org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.scaleDown(BridgeImpl.java:746)
> - locked
> org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionBridge@62661d03
> at
> org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl.connectionFailed(BridgeImpl.java:728)
> at
> org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.callSessionFailureListeners(ClientSessionFactoryImpl.java:774)
> at
> org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:709)
> at
> org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:544)
> at
> org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.access$600(ClientSessionFactoryImpl.java:75)
> at
> org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$DelegatingFailureListener.connectionFailed(ClientSessionFactoryImpl.java:1317)
> at
> org.apache.activemq.artemis.spi.core.protocol.AbstractRemotingConnection.callFailureListeners(AbstractRemotingConnection.java:78)
> at
> org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:222)
> at
> org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$CloseRunnable.run(ClientSessionFactoryImpl.java:1091)
> at
> org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
> at
> org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
> at
> org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
> at
> org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$134/0x00000008002b5840.run(Unknown
> Source)
> at
> [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at
> org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
>
> Number of locked synchronizers = 3
> - java.util.concurrent.ThreadPoolExecutor$Worker@21768e7
> - java.util.concurrent.locks.ReentrantLock$NonfairSync@32848485
> - java.util.concurrent.locks.ReentrantLock$NonfairSync@6efeeadb {code}
>
> In attachments, added logs of a restarting broker and logs of a falling
> broker.
> The broker fell two minutes after the restart.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)