[
https://issues.apache.org/jira/browse/ARTEMIS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422949#comment-17422949
]
Justin Bertram edited comment on ARTEMIS-3505 at 9/30/21, 6:35 PM:
-------------------------------------------------------------------
[~jbertram], the previous ticket related to this same issue was ARTEMIS-3355.
It was not opened by me. It was opened by one of our team members name Mahendra
Sonawale.
was (Author: ekta-awasthi):
Hello Justin,
The previous ticket related to same issue was NOT opened by me, it was opened
by one of our team members name Mahendra Sonawale. Below is the related ticket
and I also copied the comment from that ticket by YOU and pasted here. See the
very last line. Thanks
https://issues.apache.org/jira/browse/ARTEMIS-3355
This looks like a "soft" deadlock which was properly caught by the "[critical
analyzer|https://activemq.apache.org/components/artemis/documentation/latest/critical-analysis.html]."
The critical analyzer is a kind of safe-guard to catch nasty issues like this
and shut down the broker so that it can be restarted and restored to proper
working order rather than just sitting there in a dead-locked state potentially
forever while clients are unable to perform their work.
{{Thread-38676}} was blocked and triggered the failure because it is in a
section of code which is deemed "critical" to broker performance (i.e. the
TimedBuffer - which flushes data to disk):
{noformat}
"Thread-38676
(ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$7@5f5effb0)"
Id=243147 BLOCKED on
org.apache.activemq.artemis.protocol.amqp.broker.AMQPLargeMessage@3a185aa5
owned by "Thread-16
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@41b13f3d)"
Id=125
at
org.apache.activemq.artemis.protocol.amqp.broker.AMQPMessage.ensureMessageDataScanned(AMQPMessage.java:572)
- blocked on
org.apache.activemq.artemis.protocol.amqp.broker.AMQPLargeMessage@3a185aa5
at
org.apache.activemq.artemis.protocol.amqp.broker.AMQPMessage.getExpiration(AMQPMessage.java:962)
at
org.apache.activemq.artemis.protocol.amqp.broker.AMQPLargeMessagePersister.encode(AMQPLargeMessagePersister.java:97)
at
org.apache.activemq.artemis.protocol.amqp.broker.AMQPLargeMessagePersister.encode(AMQPLargeMessagePersister.java:32)
at
org.apache.activemq.artemis.core.journal.impl.dataformat.JournalAddRecord.encode(JournalAddRecord.java:72)
at
org.apache.activemq.artemis.core.io.buffer.TimedBuffer.addBytes(TimedBuffer.java:321)
- locked
org.apache.activemq.artemis.core.io.buffer.TimedBuffer@40639fab
at
org.apache.activemq.artemis.core.io.AbstractSequentialFile.write(AbstractSequentialFile.java:231)
at
org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:2937)
at
org.apache.activemq.artemis.core.journal.impl.JournalImpl.access$100(JournalImpl.java:92)
at
org.apache.activemq.artemis.core.journal.impl.JournalImpl$1.run(JournalImpl.java:850)
at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
at
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
at
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$39/2124562732.run(Unknown
Source)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
Number of locked synchronizers = 1
- java.util.concurrent.ThreadPoolExecutor$Worker@556802c0
{noformat}
{{Thread-38676}} was blocked waiting on {{Thread-16}}:
{noformat}
"Thread-16
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@41b13f3d)"
Id=125 WAITING on java.util.concurrent.CountDownLatch$Sync@2f662f7b
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.CountDownLatch$Sync@2f662f7b
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at
org.apache.activemq.artemis.utils.SimpleFutureImpl.get(SimpleFutureImpl.java:62)
at
org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkKnownRecordID(JournalImpl.java:1155)
at
org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:989)
at
org.apache.activemq.artemis.core.replication.ReplicatedJournal.appendDeleteRecord(ReplicatedJournal.java:233)
at
org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.confirmPendingLargeMessage(AbstractJournalStorageManager.java:359)
at
org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.confirmLargeMessageSend(PostOfficeImpl.java:1620)
- locked
org.apache.activemq.artemis.protocol.amqp.broker.AMQPLargeMessage@3a185aa5
at
org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.processRoute(PostOfficeImpl.java:1562)
at
org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.route(PostOfficeImpl.java:1191)
at
org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.route(PostOfficeImpl.java:1063)
at
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.doSend(ServerSessionImpl.java:2172)
- locked
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl@18f48d32
at
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.send(ServerSessionImpl.java:1812)
- locked
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl@18f48d32
at
org.apache.activemq.artemis.protocol.amqp.broker.AMQPSessionCallback.inSessionSend(AMQPSessionCallback.java:563)
at
org.apache.activemq.artemis.protocol.amqp.broker.AMQPSessionCallback.lambda$serverSend$2(AMQPSessionCallback.java:522)
at
org.apache.activemq.artemis.protocol.amqp.broker.AMQPSessionCallback$$Lambda$275/60269086.run(Unknown
Source)
at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
at
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
at
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
at
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$39/2124562732.run(Unknown
Source)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
Number of locked synchronizers = 1
- java.util.concurrent.ThreadPoolExecutor$Worker@4fd856a6
{noformat}
However, {{Thread-16}} itself is waiting _indefinitely_ in
{{java.util.concurrent.CountDownLatch#await()}}. Unfortunately, this call will
never return because it's waiting for a thread to run that itself is blocked by
{{Thread-38676}} because they're both being run by the same ordered executor.
This issue has already been resolved by the commit for ARTEMIS-3327. It will be
available in 2.18.0.
> Activemq Broker Keeps Crashing
> ------------------------------
>
> Key: ARTEMIS-3505
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3505
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Components: Broker
> Affects Versions: 2.18.0
> Environment: DEV/UAT/PROD
> Reporter: Ekta
> Priority: Critical
> Attachments: samplebroker.xml, threadDump.txt
>
>
> Hello All,
>
> We have noticed the same problem which we reported earlier with 2.17 and
> were told that it would be fixed in 2.18 version. We have recently moved all
> our environments to 2.18 version and can see the problem still exists across
> all of our env's.
>
> We have below architecture in respect to activemq master/slave setup.
> {noformat}
> producer/consumer --> Apache QPID (1.14) --> Artemis 2.18 (master/slave)
> {noformat}
> Basically, we see our master and slave brokers going down abruptly with below
> log. I have also attached the thread dump for analysis to see if anyone can
> spot anything, for sure we can see it is to do with some concurrent
> deadlocks. Please go through the attached logs and suggest any feedback, if
> any.
> The log that is causing the issue is highlighted below, as soon as the broker
> prints this, it prints The Critical Analyzer detected slow paths on the
> broker. *and therefore*, AMQ224079: The process for the virtual machine will
> be killed.
> 2021-09-29 10:37:43,327 WARN
> [org.apache.activemq.artemis.utils.critical.CriticalMeasure] Component
> org.apache.activemq.artemis.core.io.buffer.TimedBuffer is expired on path 4
> It has been happening quite frequently now and we need to come to bottom of
> this.
>
> Appreciate everyone's effort on this. [^threadDump.txt]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)