Hi, I have been conducting a few more performance and load tests with the goal of increasing participants to 100++.
The challenge is: *If more then 50-60 users dynamically create a room Hash (using Soap/Rest API) and use that Hash to enter the conference room CPU and memory spikes and server crashes* *Test scenario observations:* - It does not matter if those users try to enter the same room or separate rooms. In the above test scenario it's a mix of 4x4 conference rooms and 20x1 webinars - This can be reproduced stable and repetitively - The issue starts with API calls taking 10sec++ and getting more slower. Until the OpenMeetings Tomcat instance crashes - The issue also manifests that -BEFORE- the server crashes you can see video pods not completing the initialisation in the conference room itself. For example missing video pods or video pods without a webcam stream. Likely to be linked to slow running API or web-socket calls => I can deliver data samples or screenshots if required via our confluence space. *Hardware and software:* - Server and OpenMeetings Instance is isolated on a separated hardware and has 2GB of memory allocated - There is no spike on KMS or Database hardware/CPU/memory. The spike is only in the OpenMeetings Tomcat Server instance *Possible ways to mitigate without code changes:* - You can mitigate part of this issue if you spread the users to enter over a longer time period. However it needs more than 10min separation to enter without issues for 50-60 participants - You can mitigate part of this issue if you for example create the room-hash in a different process (like 1h before using) and once all hashes are created you enter the conference room. It still leads to issues, but you can enter up to 100 users within 5-10min, if you just use the links, rather than create the link AND entering with the link at the same time/process - Increasing Tomcat to more than 2GB of memory per Tomcat instance may help, not sure by how much though => I think we should spend further time and propose ways to get rid of those spikes. The mitigations are not realistic to really be able to use in practise. *My proposal is:* There is further analysis needed: - Capture all OpenMeetings calls that happen during the create room hash and conference room-enter - Measure call lengths and any calls during the create room hash and conference room-enter and specific CPU spikes or memory usage based on a per call basis - Eventually get a stack trace or have a profile available that exports the current in memory objects to review where and what create those spikes Once a per-call analysis is there it should be a lot more easy to pinpoint specific issues and propose improvements. As with all performance optimisation this is likely to need more discussion once more detailed data is available. Thanks, Sebastian Sebastian Wagner Director Arrakeen Solutions, OM-Hosting.com http://arrakeen-solutions.co.nz/ https://om-hosting.com - Cloud & Server Hosting for HTML5 Video-Conferencing OpenMeetings <https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url> <https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url>
