[ 
https://issues.apache.org/jira/browse/ARTEMIS-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shirsh Kumar updated ARTEMIS-4947:
----------------------------------
    Attachment: leftover_client_data.png
                histogram.png

> Out of Memory on too many connecting/disconnecting MQTT clients
> ---------------------------------------------------------------
>
>                 Key: ARTEMIS-4947
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-4947
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>    Affects Versions: 2.35.0
>            Reporter: Shirsh Kumar
>            Priority: Major
>         Attachments: 2.36-ss-0.hprof.7z, Histogram_otherObjects.png, 
> NettyAcceptor_ConnectionsAllowed_2000.png, QueueImpl_Objects.png, broker.xml, 
> dump_oom.7z.001, dump_oom.7z.002, histogram.png, leftover_client_data.png, 
> msgloadtest.7z, single-pod-oom-heap-dump.7z, single-pod-oom-thread-dump.dump
>
>
> I was trying to use Artemis ActiveMQ broker with Kubernetes using Artemis 
> cloud's  
> [activemq-artemis-operator|https://github.com/artemiscloud/activemq-artemis-operator]
>  
>  
> *Usecase* (Short Lived Clients): It consists of clients connecting to broker, 
> subscribing to topics and disconnecting after 1 or 2 minutes.
> The cluster uses the [Artemis Kubernetes image 1.0.28 
> |https://github.com/artemiscloud/activemq-artemis-broker-kubernetes-image]with
>  underlying [Artemis broker version 
> 2.35.0|https://activemq.apache.org/components/artemis/download/release-notes-2.35.0]
>  
> Kubernetes Pods config:
> CPU: 1
> Memory: 2 Gi
> Max ConnectionsAllowed: 2000
>  
> So, to test the stability of the system a load test was done on cluster with 
> initial 2 pods and scaling allowed till 3 pods.
> I assumed restriction to keep the system stable by setting ConnectionsAllowed 
> to a value of 2000 should work.
> Persistence had to be disabled as I was getting 5000ms timeout while writing 
> session state to disk (Attached broker.xml).
>  
> *Load test:* 7000 new client connections per minute (Client ID's = Some 
> Prefix + Epoch)
> (Clients connect & subscribe and then unsubscribe & disconnect after 1 
> minute). 
>  
> So, on testing the cluster with this setup I was able to see that for initial 
> duration of 30 mins to 1 hr., the broker pods run fine.
>  
> After some time, it is observed that *out of memory* in the pods occurs after 
> which they restart.
>  
> I have captured heap dump for various stages of the test:
> 1. Initial Heap Dump: 10 clients connect and then disconnect after 1 minute 
> Dump size: 63 MB
> 2. Heap Dump at 83 % memory usage after 15-20 minutes
> Dump size: 370 MB
> 3. Heap dump after OOM
> Dump size: 2.1 GB
>  
> From my observations on analyzing the heap dumps (do not have deep knowledge 
> of Artemis), I could see the below:
>  
> 1. The *QueueImpl has retained HEAP of 600 MB* which is taking up 48 percent 
> bytes.
> (Attached QueueImpl_Objects.png)
>  
> 2. There is a *mismatch* between the number of NettyServerConnections 
> object's connectedClients table and the number of session states stored in 
> MQTT
> (Attached NettyAcceptor_ConnectionsAllowed_2000.png & 
> Histogram_otherObjects.png)
>  
> 3. The thread overview at the time of dump has one active thread which is 
> trying to removeSubscriptions as below:
>  
> {noformat}
> Thread-4 
> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@3dad535f)
> Status --> alive, Runnable
> Other all threads at this point are --> alive, blocked on monitor enter. 
> -------------> Thread stack -----------> 
> org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1 @ 0xb56ff6e0 : 
> Thread-4 
> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@3dad535f)
>   at 
> java.util.concurrent.ConcurrentHashMap.forEach(Ljava/util/function/BiConsumer;)V
>  (ConcurrentHashMap.java:1603)
>   at 
> org.apache.activemq.artemis.core.postoffice.impl.SimpleAddressManager.getDirectBindings(Lorg/apache/activemq/artemis/api/core/SimpleString;)Ljava/util/Collection;
>  (SimpleAddressManager.java:165)
>   at 
> org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.getDirectBindings(Lorg/apache/activemq/artemis/api/core/SimpleString;)Ljava/util/Collection;
>  (PostOfficeImpl.java:1097)
>   at 
> org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.removeAddressInfo(Lorg/apache/activemq/artemis/api/core/SimpleString;Z)Lorg/apache/activemq/artemis/core/server/impl/AddressInfo;
>  (PostOfficeImpl.java:883)
>   at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.removeAddressInfo(Lorg/apache/activemq/artemis/api/core/SimpleString;Lorg/apache/activemq/artemis/core/security/SecurityAuth;Z)V
>  (ActiveMQServerImpl.java:4015)
>   at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.removeAddressInfo(Lorg/apache/activemq/artemis/api/core/SimpleString;Lorg/apache/activemq/artemis/core/security/SecurityAuth;)V
>  (ActiveMQServerImpl.java:3989)
>   at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.destroyQueue(Lorg/apache/activemq/artemis/api/core/SimpleString;Lorg/apache/activemq/artemis/core/security/SecurityAuth;ZZZZ)V
>  (ActiveMQServerImpl.java:2523)
>   at 
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.destroyQueue(Lorg/apache/activemq/artemis/api/core/SimpleString;Lorg/apache/activemq/artemis/core/security/SecurityAuth;ZZZ)V
>  (ActiveMQServerImpl.java:2461)
>   at 
> org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.deleteQueue(Lorg/apache/activemq/artemis/api/core/SimpleString;Z)V
>  (ServerSessionImpl.java:1212)
>   at 
> org.apache.activemq.artemis.core.protocol.mqtt.MQTTSubscriptionManager.removeSubscriptions(Ljava/util/List;Z)[S
>  (MQTTSubscriptionManager.java:271)
>   at 
> org.apache.activemq.artemis.core.protocol.mqtt.MQTTProtocolHandler.handleUnsubscribe(Lio/netty/handler/codec/mqtt/MqttUnsubscribeMessage;)V
>  (MQTTProtocolHandler.java:390)
>   at 
> org.apache.activemq.artemis.core.protocol.mqtt.MQTTProtocolHandler.act(Lio/netty/handler/codec/mqtt/MqttMessage;)V
>  (MQTTProtocolHandler.java:180)
>   at 
> org.apache.activemq.artemis.core.protocol.mqtt.MQTTProtocolHandler$$Lambda$1042+0x00007f62588397b0.onMessage(Ljava/lang/Object;)V
>  ()
>   at 
> org.apache.activemq.artemis.utils.actors.Actor.doTask(Ljava/lang/Object;)V 
> (Actor.java:32)
>   at 
> org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks()V 
> (ProcessorBase.java:68)
>   at 
> org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$548+0x00007f62585c4d80.run()V
>  ()
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>  (ThreadPoolExecutor.java:1136)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run()V 
> (ThreadPoolExecutor.java:635)
>   at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run()V 
> (ActiveMQThreadFactory.java:118) {noformat}
>  
> Please suggest on how to handle against barrage of incoming connections 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact


Reply via email to