Shirsh Kumar created ARTEMIS-4947:
-------------------------------------

             Summary: Out of Memory on too many connecting/disconnecting clients
                 Key: ARTEMIS-4947
                 URL: https://issues.apache.org/jira/browse/ARTEMIS-4947
             Project: ActiveMQ Artemis
          Issue Type: Bug
            Reporter: Shirsh Kumar
         Attachments: Histogram_otherObjects.png, 
NettyAcceptor_ConnectionsAllowed_2000.png, QueueImpl_Objects.png, broker.xml

I was trying to use Artemis ActiveMQ broker with Kubernetes using Artemis 
cloud's  
[activemq-artemis-operator|https://github.com/artemiscloud/activemq-artemis-operator]
 
 
*Usecase* (Short Lived Clients): It consists of clients connecting to broker, 
subscribing to topics and disconnecting after 1 or 2 minutes.
The cluster uses the [Artemis Kubernetes image 1.0.28 
|https://github.com/artemiscloud/activemq-artemis-broker-kubernetes-image]with 
underlying [Artemis broker version 
2.35.0|https://activemq.apache.org/components/artemis/download/release-notes-2.35.0]
 
Kubernetes Pods config:
CPU: 1
Memory: 2 Gi
Max ConnectionsAllowed: 2000
 
So, to test the stability of the system a load test was done on cluster with 
initial 2 pods and scaling allowed till 3 pods.
I assumed restriction to keep the system stable by setting ConnectionsAllowed 
to a value of 2000 should work.
Persistence had to be disabled as I was getting 5000ms timeout while writing 
session state to disk (Attached broker.xml).
 
*Load test:* 7000 new client connections per minute (Client ID's = Some Prefix 
+ Epoch)
(Clients connect & subscribe and then unsubscribe & disconnect after 1 minute).
 
 
So, on testing the cluster with this setup I was able to see that for initial 
duration of 30 mins to 1 hr., the broker pods run fine.
 
After some time, it is observed that *out of memory* in the pods occurs after 
which they restart.
 
I have captured heap dump for various stages of the test:
1. Initial Heap Dump: 10 clients connect and then disconnect after 1 minute 
Dump size: 63 MB
2. Heap Dump at 83 % memory usage after 15-20 minutes
Dump size: 370 MB
3. Heap dump after OOM
Dump size: 2.1 GB
 
>From my observations on analyzing the heap dumps (do not have deep knowledge 
>of Artemis), I could see the below:
 
1. The *QueueImpl has retained HEAP of 600 MB* which is taking up 48 percent 
bytes.
(Attached QueueImpl_Objects.png)
 
2. There is a *mismatch* between the number of NettyServerConnections object's 
connectedClients table and the number of session states stored in MQTT
(Attached NettyAcceptor_ConnectionsAllowed_2000.png & 
Histogram_otherObjects.png)
 
3. The thread overview at the time of dump has one active thread which is 
trying to removeSubscriptions as below:
 
 
{code:java}
Thread-4 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@3dad535f)
Status --> alive, Runnable
Other all threads at this point are --> alive, blocked on monitor enter. 

-------------> Thread stack -----------> 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1 @ 0xb56ff6e0 : 
Thread-4 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@3dad535f)
  at 
java.util.concurrent.ConcurrentHashMap.forEach(Ljava/util/function/BiConsumer;)V
 (ConcurrentHashMap.java:1603)
  at 
org.apache.activemq.artemis.core.postoffice.impl.SimpleAddressManager.getDirectBindings(Lorg/apache/activemq/artemis/api/core/SimpleString;)Ljava/util/Collection;
 (SimpleAddressManager.java:165)
  at 
org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.getDirectBindings(Lorg/apache/activemq/artemis/api/core/SimpleString;)Ljava/util/Collection;
 (PostOfficeImpl.java:1097)
  at 
org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.removeAddressInfo(Lorg/apache/activemq/artemis/api/core/SimpleString;Z)Lorg/apache/activemq/artemis/core/server/impl/AddressInfo;
 (PostOfficeImpl.java:883)
  at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.removeAddressInfo(Lorg/apache/activemq/artemis/api/core/SimpleString;Lorg/apache/activemq/artemis/core/security/SecurityAuth;Z)V
 (ActiveMQServerImpl.java:4015)
  at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.removeAddressInfo(Lorg/apache/activemq/artemis/api/core/SimpleString;Lorg/apache/activemq/artemis/core/security/SecurityAuth;)V
 (ActiveMQServerImpl.java:3989)
  at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.destroyQueue(Lorg/apache/activemq/artemis/api/core/SimpleString;Lorg/apache/activemq/artemis/core/security/SecurityAuth;ZZZZ)V
 (ActiveMQServerImpl.java:2523)
  at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.destroyQueue(Lorg/apache/activemq/artemis/api/core/SimpleString;Lorg/apache/activemq/artemis/core/security/SecurityAuth;ZZZ)V
 (ActiveMQServerImpl.java:2461)
  at 
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.deleteQueue(Lorg/apache/activemq/artemis/api/core/SimpleString;Z)V
 (ServerSessionImpl.java:1212)
  at 
org.apache.activemq.artemis.core.protocol.mqtt.MQTTSubscriptionManager.removeSubscriptions(Ljava/util/List;Z)[S
 (MQTTSubscriptionManager.java:271)
  at 
org.apache.activemq.artemis.core.protocol.mqtt.MQTTProtocolHandler.handleUnsubscribe(Lio/netty/handler/codec/mqtt/MqttUnsubscribeMessage;)V
 (MQTTProtocolHandler.java:390)
  at 
org.apache.activemq.artemis.core.protocol.mqtt.MQTTProtocolHandler.act(Lio/netty/handler/codec/mqtt/MqttMessage;)V
 (MQTTProtocolHandler.java:180)
  at 
org.apache.activemq.artemis.core.protocol.mqtt.MQTTProtocolHandler$$Lambda$1042+0x00007f62588397b0.onMessage(Ljava/lang/Object;)V
 ()
  at org.apache.activemq.artemis.utils.actors.Actor.doTask(Ljava/lang/Object;)V 
(Actor.java:32)
  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks()V 
(ProcessorBase.java:68)
  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$548+0x00007f62585c4d80.run()V
 ()
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
 (ThreadPoolExecutor.java:1136)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run()V 
(ThreadPoolExecutor.java:635)
  at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run()V 
(ActiveMQThreadFactory.java:118) {code}
 
Please suggest on how to handle against barrage of incoming connections 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact


Reply via email to