Hi,

I have been conducting a few more performance and load tests with the goal
of increasing participants to 100++.

The challenge is:
*If more then 50-60 users dynamically create a room Hash (using Soap/Rest
API) and use that Hash to enter the conference room CPU and memory spikes
and server crashes*

*Test scenario observations:*
 - It does not matter if those users try to enter the same room or separate
rooms. In the above test scenario it's a mix of 4x4 conference rooms and
20x1 webinars
 - This can be reproduced stable and repetitively
 - The issue starts with API calls taking 10sec++ and getting more slower.
Until the OpenMeetings Tomcat instance crashes
 - The issue also manifests that -BEFORE- the server crashes you can see
video pods not completing the initialisation in the conference room itself.
For example missing video pods or video pods without a webcam stream.
Likely to be linked to slow running API or web-socket calls
=> I can deliver data samples or screenshots if required via our confluence
space.

*Hardware and software:*
 - Server and OpenMeetings Instance is isolated on a separated hardware and
has 2GB of memory allocated
 - There is no spike on KMS or Database hardware/CPU/memory. The spike is
only in the OpenMeetings Tomcat Server instance

*Possible ways to mitigate without code changes:*
 - You can mitigate part of this issue if you spread the users to enter
over a longer time period. However it needs more than 10min separation to
enter without issues for 50-60 participants
 - You can mitigate part of this issue if you for example create the
room-hash in a different process (like 1h before using) and once all hashes
are created you enter the conference room. It still leads to issues, but
you can enter up to 100 users within 5-10min, if you just use the links,
rather than create the link AND entering with the link at the same
time/process
 - Increasing Tomcat to more than 2GB of memory per Tomcat instance may
help, not sure by how much though

 => I think we should spend further time and propose ways to get rid of
those spikes. The mitigations are not realistic to really be able to use in
practise.

*My proposal is:*
There is further analysis needed:
 - Capture all OpenMeetings calls that happen during the create room hash
and conference room-enter
 - Measure call lengths and any calls during the create room hash and
conference room-enter and specific CPU spikes or memory usage based on a
per call basis
 - Eventually get a stack trace or have a profile available that exports
the current in memory objects to review where and what create those spikes

Once a per-call analysis is there it should be a lot more easy to pinpoint
specific issues and propose improvements.

As with all performance optimisation this is likely to need more discussion
once more detailed data is available.

Thanks,
Sebastian

Sebastian Wagner
Director Arrakeen Solutions, OM-Hosting.com
http://arrakeen-solutions.co.nz/
https://om-hosting.com - Cloud & Server Hosting for HTML5
Video-Conferencing OpenMeetings
<https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url>
<https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url>

Reply via email to