[
https://issues.apache.org/jira/browse/AMQ-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847223#comment-17847223
]
Tom Tichy edited comment on AMQ-9482 at 5/17/24 10:39 AM:
----------------------------------------------------------
Here is a thread dump that I managed to get. ActiveMQ was totally out to lunch
and reporting nothing but
{code:java}
WARN | Transport Connection to: tcp://10.0.88.139:xxxxxx failed: CONNECT frame
not received with in connectionTimeout (>30000): tcp://10.0.88.139:xxxxxxx{code}
Fastthread.io analyzer came up with this
# 'Selector Worker: 12' thread is stuck on *_wait()_* method in
*_sun.nio.ch.EPoll_* file. Before getting stuck, this thread obtained *5 locks*
(sun.nio.ch.Util$2 lock, sun.nio.ch.EPollSelectorImpl lock...) and never
released it. Due to that *4480 threads are* BLOCKED as shown in this [+stack
trace+|https://fastthread.io/thread.jsp?tId=MHg0ODMwZQ==&pc=3&dumpId=1&graph=true&ts=2024-05-17T10-07-42].
If threads are BLOCKED for a prolonged period, your application can become
unresponsive.
{panel}
h2. {color:#cc3300}Selector Worker: 12{color}
PRIORITY : 5
THREAD ID : 0X00007F5284422FA0
NATIVE ID : 0X4830E
NATIVE ID (DECIMAL) : 295694
STATE : RUNNABLE
stackTrace:
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPoll.wait({color:#000080}java.base@{*}17.0.9/Native
Method{*}{color}{*}){*}
at
sun.nio.ch.EPollSelectorImpl.doSelect({color:#000080}java.base@{*}17.0.9/EPollSelectorImpl.java:118{*}{color}{*}){*}
at
sun.nio.ch.SelectorImpl.lockAndDoSelect({color:#000080}java.base@{*}17.0.9/SelectorImpl.java:129{*}{color}{*}){*}
- locked *<0x0000000713110998>* (a sun.nio.ch.Util$2)
- locked *<0x0000000713110948>* (a sun.nio.ch.EPollSelectorImpl)
at
sun.nio.ch.SelectorImpl.select({color:#000080}java.base@{*}17.0.9/SelectorImpl.java:141{*}{color}{*}){*}
at
org.apache.activemq.transport.nio.SelectorWorker.run({color:#000080}SelectorWorker.java:96{color})
at
java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution({color:#000080}java.base@{*}17.0.9/ThreadPoolExecutor.java:2037{*}{color}{*}){*}
at
java.util.concurrent.ThreadPoolExecutor.reject({color:#000080}java.base@{*}17.0.9/ThreadPoolExecutor.java:833{*}{color}{*}){*}
at
java.util.concurrent.ThreadPoolExecutor.execute({color:#000080}java.base@{*}17.0.9/ThreadPoolExecutor.java:1365{*}{color}{*}){*}
at
org.apache.activemq.transport.nio.SelectorWorker.<init>({color:#000080}SelectorWorker.java:48{color})
at
org.apache.activemq.transport.nio.SelectorManager.register({color:#000080}SelectorManager.java:121{color})
- locked *<0x000000070092b5e0>* (a
org.apache.activemq.transport.nio.SelectorManager)
at
org.apache.activemq.transport.mqtt.MQTTNIOTransport.initializeStreams({color:#000080}MQTTNIOTransport.java:68{color})
at
org.apache.activemq.transport.tcp.TcpTransport.connect({color:#000080}TcpTransport.java:543{color})
at
org.apache.activemq.transport.mqtt.MQTTNIOTransport.doStart({color:#000080}MQTTNIOTransport.java:142{color})
at
org.apache.activemq.util.ServiceSupport.start({color:#000080}ServiceSupport.java:55{color})
at
org.apache.activemq.transport.TransportFilter.start({color:#000080}TransportFilter.java:64{color})
at
org.apache.activemq.transport.mqtt.MQTTTransportFilter.start({color:#000080}MQTTTransportFilter.java:157{color})
at
org.apache.activemq.transport.mqtt.MQTTInactivityMonitor.start({color:#000080}MQTTInactivityMonitor.java:148{color})
at
org.apache.activemq.transport.TransportFilter.start({color:#000080}TransportFilter.java:64{color})
at
org.apache.activemq.broker.TransportConnection.start({color:#000080}TransportConnection.java:1080{color})
- locked *<0x0000000713110af8>* (a
org.apache.activemq.broker.jmx.ManagedTransportConnection)
at
org.apache.activemq.broker.TransportConnector$1$1.run({color:#000080}TransportConnector.java:235{color})
at
java.util.concurrent.ThreadPoolExecutor.runWorker({color:#000080}java.base@{*}17.0.9/ThreadPoolExecutor.java:1136{*}{color}{*}){*}
at
java.util.concurrent.ThreadPoolExecutor$Worker.run({color:#000080}java.base@{*}17.0.9/ThreadPoolExecutor.java:635{*}{color}{*}){*}
at
java.lang.Thread.run({color:#000080}java.base@{*}17.0.9/Thread.java:840{*}{color}{*}){*}
Locked ownable synchronizers:
- *<0x0000000707a305e0>* (a
java.util.concurrent.ThreadPoolExecutor$Worker){panel}
Other perhaps pertinent info: 99.9% of our connections are MQTT. There is a
custom auth plugin that queries AWS for credentials.
Oh.. and about 4000 of the mqtt devices send info at the same time (not in our
control), so there is a bit of a connection "storm"
Any ideas [~cshannon] ?
[^activemq.tdump]
was (Author: JIRAUSER305154):
Here is a thread dump that I managed to get. ActiveMQ was totally out to lunch
and reporting nothing but
{code:java}
WARN | Transport Connection to: tcp://10.0.88.139:xxxxxx failed: CONNECT frame
not received with in connectionTimeout (>30000): tcp://10.0.88.139:xxxxxxx{code}
Fastthread.io analyzer came up with this
# 'Selector Worker: 12' thread is stuck on *_wait()_* method in
*_sun.nio.ch.EPoll_* file. Before getting stuck, this thread obtained *5 locks*
(sun.nio.ch.Util$2 lock, sun.nio.ch.EPollSelectorImpl lock...) and never
released it. Due to that *4480 threads are* BLOCKED as shown in this [+stack
trace+|https://fastthread.io/thread.jsp?tId=MHg0ODMwZQ==&pc=3&dumpId=1&graph=true&ts=2024-05-17T10-07-42].
If threads are BLOCKED for a prolonged period, your application can become
unresponsive.
{panel}
h2. {color:#cc3300}Selector Worker: 12{color}
PRIORITY : 5
THREAD ID : 0X00007F5284422FA0
NATIVE ID : 0X4830E
NATIVE ID (DECIMAL) : 295694
STATE : RUNNABLE
stackTrace:
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPoll.wait({color:#000080}java.base@{*}17.0.9/Native
Method{*}{color}{*}){*}
at
sun.nio.ch.EPollSelectorImpl.doSelect({color:#000080}java.base@{*}17.0.9/EPollSelectorImpl.java:118{*}{color}{*}){*}
at
sun.nio.ch.SelectorImpl.lockAndDoSelect({color:#000080}java.base@{*}17.0.9/SelectorImpl.java:129{*}{color}{*}){*}
- locked *<0x0000000713110998>* (a sun.nio.ch.Util$2)
- locked *<0x0000000713110948>* (a sun.nio.ch.EPollSelectorImpl)
at
sun.nio.ch.SelectorImpl.select({color:#000080}java.base@{*}17.0.9/SelectorImpl.java:141{*}{color}{*}){*}
at
org.apache.activemq.transport.nio.SelectorWorker.run({color:#000080}SelectorWorker.java:96{color})
at
java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution({color:#000080}java.base@{*}17.0.9/ThreadPoolExecutor.java:2037{*}{color}{*}){*}
at
java.util.concurrent.ThreadPoolExecutor.reject({color:#000080}java.base@{*}17.0.9/ThreadPoolExecutor.java:833{*}{color}{*}){*}
at
java.util.concurrent.ThreadPoolExecutor.execute({color:#000080}java.base@{*}17.0.9/ThreadPoolExecutor.java:1365{*}{color}{*}){*}
at
org.apache.activemq.transport.nio.SelectorWorker.<init>({color:#000080}SelectorWorker.java:48{color})
at
org.apache.activemq.transport.nio.SelectorManager.register({color:#000080}SelectorManager.java:121{color})
- locked *<0x000000070092b5e0>* (a
org.apache.activemq.transport.nio.SelectorManager)
at
org.apache.activemq.transport.mqtt.MQTTNIOTransport.initializeStreams({color:#000080}MQTTNIOTransport.java:68{color})
at
org.apache.activemq.transport.tcp.TcpTransport.connect({color:#000080}TcpTransport.java:543{color})
at
org.apache.activemq.transport.mqtt.MQTTNIOTransport.doStart({color:#000080}MQTTNIOTransport.java:142{color})
at
org.apache.activemq.util.ServiceSupport.start({color:#000080}ServiceSupport.java:55{color})
at
org.apache.activemq.transport.TransportFilter.start({color:#000080}TransportFilter.java:64{color})
at
org.apache.activemq.transport.mqtt.MQTTTransportFilter.start({color:#000080}MQTTTransportFilter.java:157{color})
at
org.apache.activemq.transport.mqtt.MQTTInactivityMonitor.start({color:#000080}MQTTInactivityMonitor.java:148{color})
at
org.apache.activemq.transport.TransportFilter.start({color:#000080}TransportFilter.java:64{color})
at
org.apache.activemq.broker.TransportConnection.start({color:#000080}TransportConnection.java:1080{color})
- locked *<0x0000000713110af8>* (a
org.apache.activemq.broker.jmx.ManagedTransportConnection)
at
org.apache.activemq.broker.TransportConnector$1$1.run({color:#000080}TransportConnector.java:235{color})
at
java.util.concurrent.ThreadPoolExecutor.runWorker({color:#000080}java.base@{*}17.0.9/ThreadPoolExecutor.java:1136{*}{color}{*}){*}
at
java.util.concurrent.ThreadPoolExecutor$Worker.run({color:#000080}java.base@{*}17.0.9/ThreadPoolExecutor.java:635{*}{color}{*}){*}
at
java.lang.Thread.run({color:#000080}java.base@{*}17.0.9/Thread.java:840{*}{color}{*}){*}
Locked ownable synchronizers:
- *<0x0000000707a305e0>* (a
java.util.concurrent.ThreadPoolExecutor$Worker){panel}
Other perhaps pertinent info: 99.9% of our connections are MQTT. There is a
custom auth plugin that queries AWS for credentials
Any ideas [~cshannon] ?
[^activemq.tdump]
> Broker crashes after runaway threads spawn
> ------------------------------------------
>
> Key: AMQ-9482
> URL: https://issues.apache.org/jira/browse/AMQ-9482
> Project: ActiveMQ Classic
> Issue Type: Bug
> Components: Broker
> Affects Versions: 5.17.6, 6.0.1
> Environment: Bitnami created AMI in AWS
> Reporter: Tom Tichy
> Priority: Major
> Attachments: activemq.tdump, brokerInfo-after-crash-redacted.json
>
>
> Running on Bitnami created AMI in AWS. The broker has about 7000 devices
> connected via MQTT. Each devices has its own topic name.
> Broker stays up for about 4-5 days before being hobbled and unable to create
> any new tasks/accept any new connections.
> (There is identical setup for staging environment with about 100 devices
> connected. It runs without any issues.)
> I have troubleshot the cause to be the systemd task limit. The current
> `TasksMax` is 18100. When running normally, the number of tasks is about 300.
> Then (every 4-5 days) there is a quick spike to the max 18100 tasks and it
> stays there never coming back down. The result is that the broker just sits
> there, does nothing useful and keeps logging the following message
>
> {code:java}
> [659914.788s][warning][os,thread] Failed to start thread "Unknown thread" -
> pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, g
> uardsize: 0k, detached.
> [659914.788s][warning][os,thread] Failed to start the native thread for
> java.lang.Thread "ActiveMQ BrokerService[localhost] Task-281805"
> ERROR | Scheduled task error
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of
> memory or process/resource limits reached
> at java.lang.Thread.start0(Native Method) ~[?:?]
> at java.lang.Thread.start(Thread.java:809) ~[?:?]
> at
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:945)
> ~[?:?]
> at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364)
> ~[?:?]
> at
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:173)
> ~[activemq-client-6.0.1.jar:6.0.1]
> at
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:165)
> ~[activemq-client-6.0.1.jar:6.0.1]
> at org.apache.activemq.broker.region.Topic$7.run(Topic.java:820)
> ~[activemq-broker-6.0.1.jar:6.0.1]
> at
> org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:39)
> ~[activemq-client-6.0.1.jar:6.0.1]
> at java.util.TimerThread.mainLoop(Timer.java:566) ~[?:?]
> at java.util.TimerThread.run(Timer.java:516) ~[?:?]
> Exception in thread "ActiveMQ Broker[localhost] Scheduler"
> java.lang.OutOfMemoryError: unable to create native thread: possibly out of
> memory or process/resource limits reached
> at java.base/java.lang.Thread.start0(Native Method)
> at java.base/java.lang.Thread.start(Thread.java:809)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:945)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364)
> at
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:173)
> at
> org.apache.activemq.thread.TaskRunnerFactory.execute(TaskRunnerFactory.java:165)
> at org.apache.activemq.broker.region.Topic$7.run(Topic.java:820)
> at
> org.apache.activemq.thread.SchedulerTimerTask.run(SchedulerTimerTask.java:39)
> at java.base/java.util.TimerThread.mainLoop(Timer.java:566)
> at java.base/java.util.TimerThread.run(Timer.java:516)
> {code}
>
> The start command is
> {code:java}
> /opt/bitnami/java/bin/java -Xms2G -Xmx4G
> -Djava.util.logging.config.file=logging.properties
> -Djava.security.auth.login.config=/opt/bitnami/activemq/conf/login.config
> -Dorg.apache.activemq.UseDedicatedTaskRunner=false
> -Dcom.sun.management.jmxremote -Djava.awt.headless=true
> -Djava.io.tmpdir=/opt/bitnami/activemq/tmp --add-reads=java.xml=java.logging
> --add-opens java.base/java.security=ALL-UNNAMED --add-opens
> java.base/java.net=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED
> --add-opens java.base/java.util=ALL-UNNAMED --add-opens
> java.naming/javax.naming.spi=ALL-UNNAMED --add-opens
> java.rmi/sun.rmi.transport.tcp=ALL-UNNAMED --add-opens
> java.base/java.util.concurrent=ALL-UNNAMED --add-opens
> java.base/java.util.concurrent.atomic=ALL-UNNAMED
> --add-exports=java.base/sun.net.www.protocol.http=ALL-UNNAMED
> --add-exports=java.base/sun.net.www.protocol.https=ALL-UNNAMED
> --add-exports=java.base/sun.net.www.protocol.jar=ALL-UNNAMED
> --add-exports=jdk.xml.dom/org.w3c.dom.html=ALL-UNNAMED
> --add-exports=jdk.naming.rmi/com.sun.jndi.url.rmi=ALL-UNNAMED
> -Dactivemq.classpath=/opt/bitnami/activemq/conf:/opt/bitnami/activemq/../lib/:
> -Dactivemq.home=/opt/bitnami/activemq -Dactivemq.base=/opt/bitnami/activemq
> -Dactivemq.conf=/opt/bitnami/activemq/conf
> -Dactivemq.data=/opt/bitnami/activemq/data
> -Djolokia.conf=file:/opt/bitnami/activemq/conf/jolokia-access.xml -jar
> /opt/bitnami/activemq/bin/activemq.jar start {code}
> During the error condition, I am able to collect broker information via
> jolokia: [^brokerInfo-after-crash-redacted.json]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)