API call sequence is: UserService>login RoomService>getExternal UserService>getRoomHash >Browser load URL $myURL?secureHash=XYZ (which in turn will trigger another LOT of new internal calls in Openmeetings) => Crashes OpenMeetings Tomcat with 50-60 users entering within 5min the same server instance.
You can also crash the server (or make it significant slow) with JUST: Browser load URL $myURL?secureHash=XYZ (assuming hash is pre-existing) => With ~100 users entering you can start seeing degradation of performance of Openmeetings Tomcat instance: - Video pods disappearing - Slow response times. It would be interesting to find out how long it would take to crash in this scenario but I would think ~150 users potentially. Thanks Sebastian Sebastian Wagner Director Arrakeen Solutions, OM-Hosting.com http://arrakeen-solutions.co.nz/ https://om-hosting.com - Cloud & Server Hosting for HTML5 Video-Conferencing OpenMeetings <https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url> <https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url> On Tue, 2 Feb 2021 at 15:55, Maxim Solodovnik <[email protected]> wrote: > On Tue, 2 Feb 2021 at 07:23, [email protected] <[email protected]> > wrote: > > > Hi, > > > > I have been conducting a few more performance and load tests with the > goal > > of increasing participants to 100++. > > > > The challenge is: > > *If more then 50-60 users dynamically create a room Hash (using Soap/Rest > > API) and use that Hash to enter the conference room CPU and memory spikes > > and server crashes* > > > > Can you share API call sequence? > Maybe we can write JMeter scenario for this? > > server crash is something bad > What is happening? Is it a JVM crash? Or is the system low of resources and > the kernel kills the trouble-maker? > > > > *Test scenario observations:* > > - It does not matter if those users try to enter the same room or > separate > > rooms. In the above test scenario it's a mix of 4x4 conference rooms and > > 20x1 webinars > > - This can be reproduced stable and repetitively > > - The issue starts with API calls taking 10sec++ and getting more > slower. > > Until the OpenMeetings Tomcat instance crashes > > - The issue also manifests that -BEFORE- the server crashes you can see > > video pods not completing the initialisation in the conference room > itself. > > For example missing video pods or video pods without a webcam stream. > > Likely to be linked to slow running API or web-socket calls > > => I can deliver data samples or screenshots if required via our > confluence > > space. > > > > *Hardware and software:* > > - Server and OpenMeetings Instance is isolated on a separated hardware > and > > has 2GB of memory allocated > > - There is no spike on KMS or Database hardware/CPU/memory. The spike is > > only in the OpenMeetings Tomcat Server instance > > > > *Possible ways to mitigate without code changes:* > > - You can mitigate part of this issue if you spread the users to enter > > over a longer time period. However it needs more than 10min separation to > > enter without issues for 50-60 participants > > - You can mitigate part of this issue if you for example create the > > room-hash in a different process (like 1h before using) and once all > hashes > > are created you enter the conference room. It still leads to issues, but > > you can enter up to 100 users within 5-10min, if you just use the links, > > rather than create the link AND entering with the link at the same > > time/process > > - Increasing Tomcat to more than 2GB of memory per Tomcat instance may > > help, not sure by how much though > > > > => I think we should spend further time and propose ways to get rid of > > those spikes. The mitigations are not realistic to really be able to use > in > > practise. > > > > *My proposal is:* > > There is further analysis needed: > > - Capture all OpenMeetings calls that happen during the create room hash > > and conference room-enter > > - Measure call lengths and any calls during the create room hash and > > conference room-enter and specific CPU spikes or memory usage based on a > > per call basis > > - Eventually get a stack trace or have a profile available that exports > > the current in memory objects to review where and what create those > spikes > > > > Once a per-call analysis is there it should be a lot more easy to > pinpoint > > specific issues and propose improvements. > > > > As with all performance optimisation this is likely to need more > discussion > > once more detailed data is available. > > > > Thanks, > > Sebastian > > > > Sebastian Wagner > > Director Arrakeen Solutions, OM-Hosting.com > > http://arrakeen-solutions.co.nz/ > > https://om-hosting.com - Cloud & Server Hosting for HTML5 > > Video-Conferencing OpenMeetings > > < > > > https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url > > > > > < > > > https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url > > > > > > > > -- > Best regards, > Maxim >
