[
https://issues.apache.org/jira/browse/ARTEMIS-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17914302#comment-17914302
]
Shirsh Kumar commented on ARTEMIS-4947:
---------------------------------------
Hello,
I tried to check with the latest version 2.39.0
and found root cause for sudden OOM when load is increased.
The client traffic is new client requests at 16000 requests per minute and
clients are ephemeral as before
*Scenario 1:*
2GB RAM, 2 core, connectionsAllowed : 6,000 -> Works Fine
*Scenario 2:*
4GB RAM, 6 core, connectionsAllowed :15,000 -> Gets stuck at sync methods &
throws OOM. in 20-30min
Threads are getting stuck at synchronized methods: (Ex- below)
"38,927 instances of “org.apache.activemq.artemis.core.server.impl.QueueImpl”,
loaded by “java.net.URLClassLoader @ 0x75a3e70e8” occupy 1.36 GB (52.47%) bytes"
Stack trace:
org.apache.activemq.artemis.core.server.management.impl.ManagementServiceImpl.sendNotification(Lorg/apache/activemq/artemis/core/server/management/Notification;)V
(ManagementServiceImpl.java:779)
> Out of Memory on too many connecting/disconnecting MQTT clients
> ---------------------------------------------------------------
>
> Key: ARTEMIS-4947
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4947
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Affects Versions: 2.35.0
> Reporter: Shirsh Kumar
> Priority: Major
> Attachments: 2.36-ss-0.hprof.7z, Histogram_otherObjects.png,
> NettyAcceptor_ConnectionsAllowed_2000.png, QueueImpl_Objects.png, broker.xml,
> dump_oom.7z.001, dump_oom.7z.002, histogram.png, leftover_client_data.png,
> msgloadtest.7z, single-pod-oom-heap-dump.7z, single-pod-oom-thread-dump.dump
>
>
> I was trying to use Artemis ActiveMQ broker with Kubernetes using Artemis
> cloud's
> [activemq-artemis-operator|https://github.com/artemiscloud/activemq-artemis-operator]
>
>
> *Usecase* (Short Lived Clients): It consists of clients connecting to broker,
> subscribing to topics and disconnecting after 1 or 2 minutes.
> The cluster uses the [Artemis Kubernetes image 1.0.28
> |https://github.com/artemiscloud/activemq-artemis-broker-kubernetes-image]with
> underlying [Artemis broker version
> 2.35.0|https://activemq.apache.org/components/artemis/download/release-notes-2.35.0]
>
> Kubernetes Pods config:
> CPU: 1
> Memory: 2 Gi
> Max ConnectionsAllowed: 2000
>
> So, to test the stability of the system a load test was done on cluster with
> initial 2 pods and scaling allowed till 3 pods.
> I assumed restriction to keep the system stable by setting ConnectionsAllowed
> to a value of 2000 should work.
> Persistence had to be disabled as I was getting 5000ms timeout while writing
> session state to disk (Attached broker.xml).
>
> *Load test:* 7000 new client connections per minute (Client ID's = Some
> Prefix + Epoch)
> (Clients connect & subscribe and then unsubscribe & disconnect after 1
> minute).
>
> So, on testing the cluster with this setup I was able to see that for initial
> duration of 30 mins to 1 hr., the broker pods run fine.
>
> After some time, it is observed that *out of memory* in the pods occurs after
> which they restart.
>
> I have captured heap dump for various stages of the test:
> 1. Initial Heap Dump: 10 clients connect and then disconnect after 1 minute
> Dump size: 63 MB
> 2. Heap Dump at 83 % memory usage after 15-20 minutes
> Dump size: 370 MB
> 3. Heap dump after OOM
> Dump size: 2.1 GB
>
> From my observations on analyzing the heap dumps (do not have deep knowledge
> of Artemis), I could see the below:
>
> 1. The *QueueImpl has retained HEAP of 600 MB* which is taking up 48 percent
> bytes.
> (Attached QueueImpl_Objects.png)
>
> 2. There is a *mismatch* between the number of NettyServerConnections
> object's connectedClients table and the number of session states stored in
> MQTT
> (Attached NettyAcceptor_ConnectionsAllowed_2000.png &
> Histogram_otherObjects.png)
>
> 3. The thread overview at the time of dump has one active thread which is
> trying to removeSubscriptions as below:
>
> {noformat}
> Thread-4
> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@3dad535f)
> Status --> alive, Runnable
> Other all threads at this point are --> alive, blocked on monitor enter.
> -------------> Thread stack ----------->
> org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1 @ 0xb56ff6e0 :
> Thread-4
> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@3dad535f)
> at
> java.util.concurrent.ConcurrentHashMap.forEach(Ljava/util/function/BiConsumer;)V
> (ConcurrentHashMap.java:1603)
> at
> org.apache.activemq.artemis.core.postoffice.impl.SimpleAddressManager.getDirectBindings(Lorg/apache/activemq/artemis/api/core/SimpleString;)Ljava/util/Collection;
> (SimpleAddressManager.java:165)
> at
> org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.getDirectBindings(Lorg/apache/activemq/artemis/api/core/SimpleString;)Ljava/util/Collection;
> (PostOfficeImpl.java:1097)
> at
> org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.removeAddressInfo(Lorg/apache/activemq/artemis/api/core/SimpleString;Z)Lorg/apache/activemq/artemis/core/server/impl/AddressInfo;
> (PostOfficeImpl.java:883)
> at
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.removeAddressInfo(Lorg/apache/activemq/artemis/api/core/SimpleString;Lorg/apache/activemq/artemis/core/security/SecurityAuth;Z)V
> (ActiveMQServerImpl.java:4015)
> at
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.removeAddressInfo(Lorg/apache/activemq/artemis/api/core/SimpleString;Lorg/apache/activemq/artemis/core/security/SecurityAuth;)V
> (ActiveMQServerImpl.java:3989)
> at
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.destroyQueue(Lorg/apache/activemq/artemis/api/core/SimpleString;Lorg/apache/activemq/artemis/core/security/SecurityAuth;ZZZZ)V
> (ActiveMQServerImpl.java:2523)
> at
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.destroyQueue(Lorg/apache/activemq/artemis/api/core/SimpleString;Lorg/apache/activemq/artemis/core/security/SecurityAuth;ZZZ)V
> (ActiveMQServerImpl.java:2461)
> at
> org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.deleteQueue(Lorg/apache/activemq/artemis/api/core/SimpleString;Z)V
> (ServerSessionImpl.java:1212)
> at
> org.apache.activemq.artemis.core.protocol.mqtt.MQTTSubscriptionManager.removeSubscriptions(Ljava/util/List;Z)[S
> (MQTTSubscriptionManager.java:271)
> at
> org.apache.activemq.artemis.core.protocol.mqtt.MQTTProtocolHandler.handleUnsubscribe(Lio/netty/handler/codec/mqtt/MqttUnsubscribeMessage;)V
> (MQTTProtocolHandler.java:390)
> at
> org.apache.activemq.artemis.core.protocol.mqtt.MQTTProtocolHandler.act(Lio/netty/handler/codec/mqtt/MqttMessage;)V
> (MQTTProtocolHandler.java:180)
> at
> org.apache.activemq.artemis.core.protocol.mqtt.MQTTProtocolHandler$$Lambda$1042+0x00007f62588397b0.onMessage(Ljava/lang/Object;)V
> ()
> at
> org.apache.activemq.artemis.utils.actors.Actor.doTask(Ljava/lang/Object;)V
> (Actor.java:32)
> at
> org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks()V
> (ProcessorBase.java:68)
> at
> org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$548+0x00007f62585c4d80.run()V
> ()
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
> (ThreadPoolExecutor.java:1136)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run()V
> (ThreadPoolExecutor.java:635)
> at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run()V
> (ActiveMQThreadFactory.java:118) {noformat}
>
> Please suggest on how to handle against barrage of incoming connections
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact