mortenstevens opened a new issue, #13382:
URL: https://github.com/apache/cloudstack/issues/13382

   ### problem
   
   We are experiencing a critical connection leak in Apache CloudStack 4.22.1.0 
(running on Ubuntu 24.04). The background task for Out-of-Band Management 
(OOBM) leaks database connections every time it executes.
   
   Even when setting wait_timeout on MySQL server side or trying to inject pool 
properties, the connections remain blocked within the Java application state as 
active, eventually hitting the db.cloud.maxActive threshold (default 250), 
which causes the management server to stop responding and throw 
SQLTransientConnectionException.
   
   ### versions
   
   CloudStack Version: 4.22.1.0
   OS: Ubuntu 24.04.4 LTS
   DB: MySQL 8.0.45
   Java Version: 17.0.19
   
   ### The steps to reproduce the bug
   
   1. Configure Out-of-Band Management (OOBM) for multiple physical hosts
   2. Set outofbandmanagement.background.task.execution.interval to a lower 
value for testing purposes (e.g., 60 or 300  to accelerate the leak
   3.  Monitor the MySQL SHOW FULL PROCESSLIST; vs. the CloudStack Management 
Server Metrics over time.
   ...
   
   
   ### What to do about it?
   
   Every time the OOBM task runs, it opens 1 connection per configured host (3 
connections in total for our setup). These connections are never returned to 
the HikariCP pool (missing .close() or unhandled exception block in the OOBM 
plugin execution layer).
   
   The Mismatch between DB and Java Pool:
   
   MySQL Side: Sells connections as Sleep. If MySQL kills them via 
wait_timeout, the sockets are closed on the network layer.
   
   CloudStack/HikariCP Side: Because the leaked connections are still flagged 
as active (In-Use) by the OOBM thread, HikariCP never runs a health check on 
them and refuses to evict them via maxLifetime. The internal counter stays at 
active=250.
   
   Once the counter hits 250, the management server crashes.
   
   Logs & Error Stacktrace:
   
   2026-06-08 21:14:28,070 ERROR [c.c.s.S.ManagementServerCollector] 
(StatsCollector-1:[ctx-fd90ba07]) (logid:182faa68) Error trying to retrieve 
management server host statistics 
com.cloud.utils.exception.CloudRuntimeException: Unable to find on DB, due to: 
cloud - Connection is not available, request timed out after 30000ms 
(total=250, active=250, idle=0, waiting=22)
   
   Is there any workaround available? Maybe switching to dbcp?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to