On Tue, 2 Feb 2021 at 07:23, [email protected] <[email protected]> wrote:
> Hi, > > I have been conducting a few more performance and load tests with the goal > of increasing participants to 100++. > > The challenge is: > *If more then 50-60 users dynamically create a room Hash (using Soap/Rest > API) and use that Hash to enter the conference room CPU and memory spikes > and server crashes* > Can you share API call sequence? Maybe we can write JMeter scenario for this? server crash is something bad What is happening? Is it a JVM crash? Or is the system low of resources and the kernel kills the trouble-maker? > *Test scenario observations:* > - It does not matter if those users try to enter the same room or separate > rooms. In the above test scenario it's a mix of 4x4 conference rooms and > 20x1 webinars > - This can be reproduced stable and repetitively > - The issue starts with API calls taking 10sec++ and getting more slower. > Until the OpenMeetings Tomcat instance crashes > - The issue also manifests that -BEFORE- the server crashes you can see > video pods not completing the initialisation in the conference room itself. > For example missing video pods or video pods without a webcam stream. > Likely to be linked to slow running API or web-socket calls > => I can deliver data samples or screenshots if required via our confluence > space. > > *Hardware and software:* > - Server and OpenMeetings Instance is isolated on a separated hardware and > has 2GB of memory allocated > - There is no spike on KMS or Database hardware/CPU/memory. The spike is > only in the OpenMeetings Tomcat Server instance > > *Possible ways to mitigate without code changes:* > - You can mitigate part of this issue if you spread the users to enter > over a longer time period. However it needs more than 10min separation to > enter without issues for 50-60 participants > - You can mitigate part of this issue if you for example create the > room-hash in a different process (like 1h before using) and once all hashes > are created you enter the conference room. It still leads to issues, but > you can enter up to 100 users within 5-10min, if you just use the links, > rather than create the link AND entering with the link at the same > time/process > - Increasing Tomcat to more than 2GB of memory per Tomcat instance may > help, not sure by how much though > > => I think we should spend further time and propose ways to get rid of > those spikes. The mitigations are not realistic to really be able to use in > practise. > > *My proposal is:* > There is further analysis needed: > - Capture all OpenMeetings calls that happen during the create room hash > and conference room-enter > - Measure call lengths and any calls during the create room hash and > conference room-enter and specific CPU spikes or memory usage based on a > per call basis > - Eventually get a stack trace or have a profile available that exports > the current in memory objects to review where and what create those spikes > > Once a per-call analysis is there it should be a lot more easy to pinpoint > specific issues and propose improvements. > > As with all performance optimisation this is likely to need more discussion > once more detailed data is available. > > Thanks, > Sebastian > > Sebastian Wagner > Director Arrakeen Solutions, OM-Hosting.com > http://arrakeen-solutions.co.nz/ > https://om-hosting.com - Cloud & Server Hosting for HTML5 > Video-Conferencing OpenMeetings > < > https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url > > > < > https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url > > > -- Best regards, Maxim
