Re: Performance & Load Testing Results and next steps - Improving OpenMeetings sign in and room enter performance

[email protected] Tue, 02 Feb 2021 00:26:18 -0800

I doubled it to 4GB OpenMeetings and 4GB KMS. I updated the docker instance
to run Openmeetings with xms=2GB and Xmx=4GB.


And I did run exactly the same test again:
 - 50-60 users
 - staggered to enter in a time period around 5-10min
 - distributed into 10 conference rooms 4x4 and 2 webinars with 20 users
each
 - each test runs calls the API to login/createRoomHash and then load the
URL with the room (plus start webcam/audio stream in the conference rooms)

The results look almost the same. There is hardly any improvement:

   - CPU still spikes to almost 100%, memory is not a problem
   - Empty video pods as well as video pods where webcam stream didn't start

There isn't a crash, but that is mostly because I stagger it to enter the
server over a 5-10min period. Which didn't crash the 2GB instance either.

Comparison of the CPU graphs of both hardware configuration and test runs:
https://cwiki.apache.org/confluence/display/OPENMEETINGS/Performance+Testing#PerformanceTesting-ClusterPerformancetestresult02-022021

There is pretty much no improvement.

There is some work on the application side needed. This does not look like
getting better by throwing more hardware at it.

It is really quite limiting to have no logs about any sort of performance
indicators like call length to narrow down where the bottleneck is.
You may find some very low hanging fruits in terms of optimisation if you
can simply concentrate on the top ten calls and optimise those.
Rather than looking at CPU and memory graphs.

Thanks
Sebastian

Sebastian Wagner
Director Arrakeen Solutions, OM-Hosting.com
http://arrakeen-solutions.co.nz/
https://om-hosting.com - Cloud & Server Hosting for HTML5
Video-Conferencing OpenMeetings
<https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url>
<https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url>


On Tue, 2 Feb 2021 at 17:18, [email protected] <[email protected]>
wrote:

> Have we ever looked into which java method would require the most
> resources/time during the process of entering the conference room ?
>
> Sebastian Wagner
> Director Arrakeen Solutions, OM-Hosting.com
> http://arrakeen-solutions.co.nz/
> https://om-hosting.com - Cloud & Server Hosting for HTML5
> Video-Conferencing OpenMeetings
>
> <https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url>
> <https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url>
>
>
> On Tue, 2 Feb 2021 at 16:48, Maxim Solodovnik <[email protected]>
> wrote:
>
>> While do load testing I did the following:
>>
>> create Jmeter test loading "semistatic" stateless error page with 300
>> simultaneous threads (I can share this test it is very simple)
>> CPU usage of OM process was near to 100%
>> the situation is better if Tomcat has more threads (maxThread parameter)
>>
>> I guess we need to check "The Ultimate Tomcat Performace Guide" :)))
>>
>> On Tue, 2 Feb 2021 at 10:41, [email protected] <[email protected]
>> >
>> wrote:
>>
>> > Also the spikes are on the CPU actually more than on the memory:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/OPENMEETINGS/Performance+Testing#PerformanceTesting-ClusterPerformancetestresult02-022021
>> >
>> > The spike is just 50-60 users.
>> >
>> > Why would CPU spike to almost 100% just for that amount of users ?
>> >
>> > I can try with 4GB for Openmeetings and repeat the test.
>> >
>> > Thanks
>> > Seb
>> >
>> > Sebastian Wagner
>> > Director Arrakeen Solutions, OM-Hosting.com
>> > http://arrakeen-solutions.co.nz/
>> > https://om-hosting.com - Cloud & Server Hosting for HTML5
>> > Video-Conferencing OpenMeetings
>> > <
>> >
>> https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url
>> > >
>> > <
>> >
>> https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url
>> > >
>> >
>> >
>> > On Tue, 2 Feb 2021 at 16:34, Maxim Solodovnik <[email protected]>
>> > wrote:
>> >
>> > > On Tue, 2 Feb 2021 at 10:30, [email protected] <
>> > [email protected]>
>> > > wrote:
>> > >
>> > > > I think what you mean is you have OpenMeetings and MySQL and KMS on
>> one
>> > > > instance with 4GB.
>> > > >
>> > > > But its 2GB Just for OpenMeetings.
>> > > >
>> > >
>> > > I mean
>> > > 4GB just for OM (demo-next)
>> > > 8GB just for OM (demo-prod)
>> > > and this might need to be increased in case of many users
>> > >
>> > > Additionally Tomcat's maxThreads might need to be increased here:
>> > >
>> > >
>> >
>> https://github.com/apache/openmeetings/blob/master/openmeetings-server/src/main/assembly/conf/server.xml#L74
>> > >
>> > > I suspect lot's of simultaneous users need more resources
>> > >
>> > >
>> > > KMS is separated with another 2GB
>> > > > MySQL is on another server with another 2GB
>> > > > So that would be 6GB in total. But only 2 are allocated to
>> > OpenMeetings.
>> > > >
>> > > > XmX=2GB for OpenMeetings should be enough and not crash with 50-60
>> > users
>> > > > entering the room at the same time.
>> > > >
>> > > > Thanks
>> > > > Sebastian
>> > > >
>> > > > Sebastian Wagner
>> > > > Director Arrakeen Solutions, OM-Hosting.com
>> > > > http://arrakeen-solutions.co.nz/
>> > > > https://om-hosting.com - Cloud & Server Hosting for HTML5
>> > > > Video-Conferencing OpenMeetings
>> > > > <
>> > > >
>> > >
>> >
>> https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url
>> > > > >
>> > > > <
>> > > >
>> > >
>> >
>> https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url
>> > > > >
>> > > >
>> > > >
>> > > > On Tue, 2 Feb 2021 at 16:26, Maxim Solodovnik <[email protected]
>> >
>> > > > wrote:
>> > > >
>> > > > > Hello Sebastian,
>> > > > >
>> > > > > It seems 2GB of RAM is not enough for OM
>> > > > >       `OutOfMemoryError: Container killed due to memory usage`
>> > > > > I never use less than 4GB (8-16GB in production)
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Tue, 2 Feb 2021 at 09:54, Maxim Solodovnik <
>> [email protected]>
>> > > > > wrote:
>> > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Tue, 2 Feb 2021 at 07:23, [email protected] <
>> > > > > [email protected]>
>> > > > > > wrote:
>> > > > > >
>> > > > > >> Hi,
>> > > > > >>
>> > > > > >> I have been conducting a few more performance and load tests
>> with
>> > > the
>> > > > > goal
>> > > > > >> of increasing participants to 100++.
>> > > > > >>
>> > > > > >> The challenge is:
>> > > > > >> *If more then 50-60 users dynamically create a room Hash (using
>> > > > > Soap/Rest
>> > > > > >> API) and use that Hash to enter the conference room CPU and
>> memory
>> > > > > spikes
>> > > > > >> and server crashes*
>> > > > > >>
>> > > > > >
>> > > > > > Can you share API call sequence?
>> > > > > > Maybe we can write JMeter scenario for this?
>> > > > > >
>> > > > > > server crash is something bad
>> > > > > > What is happening? Is it a JVM crash? Or is the system low of
>> > > resources
>> > > > > > and the kernel kills the trouble-maker?
>> > > > > >
>> > > > > >
>> > > > > >> *Test scenario observations:*
>> > > > > >>  - It does not matter if those users try to enter the same
>> room or
>> > > > > >> separate
>> > > > > >> rooms. In the above test scenario it's a mix of 4x4 conference
>> > rooms
>> > > > and
>> > > > > >> 20x1 webinars
>> > > > > >>  - This can be reproduced stable and repetitively
>> > > > > >>  - The issue starts with API calls taking 10sec++ and getting
>> more
>> > > > > slower.
>> > > > > >> Until the OpenMeetings Tomcat instance crashes
>> > > > > >>  - The issue also manifests that -BEFORE- the server crashes
>> you
>> > can
>> > > > see
>> > > > > >> video pods not completing the initialisation in the conference
>> > room
>> > > > > >> itself.
>> > > > > >> For example missing video pods or video pods without a webcam
>> > > stream.
>> > > > > >> Likely to be linked to slow running API or web-socket calls
>> > > > > >> => I can deliver data samples or screenshots if required via
>> our
>> > > > > >> confluence
>> > > > > >> space.
>> > > > > >>
>> > > > > >> *Hardware and software:*
>> > > > > >>  - Server and OpenMeetings Instance is isolated on a separated
>> > > > hardware
>> > > > > >> and
>> > > > > >> has 2GB of memory allocated
>> > > > > >>  - There is no spike on KMS or Database hardware/CPU/memory.
>> The
>> > > spike
>> > > > > is
>> > > > > >> only in the OpenMeetings Tomcat Server instance
>> > > > > >>
>> > > > > >> *Possible ways to mitigate without code changes:*
>> > > > > >>  - You can mitigate part of this issue if you spread the users
>> to
>> > > > enter
>> > > > > >> over a longer time period. However it needs more than 10min
>> > > separation
>> > > > > to
>> > > > > >> enter without issues for 50-60 participants
>> > > > > >>  - You can mitigate part of this issue if you for example
>> create
>> > the
>> > > > > >> room-hash in a different process (like 1h before using) and
>> once
>> > all
>> > > > > >> hashes
>> > > > > >> are created you enter the conference room. It still leads to
>> > issues,
>> > > > but
>> > > > > >> you can enter up to 100 users within 5-10min, if you just use
>> the
>> > > > links,
>> > > > > >> rather than create the link AND entering with the link at the
>> same
>> > > > > >> time/process
>> > > > > >>  - Increasing Tomcat to more than 2GB of memory per Tomcat
>> > instance
>> > > > may
>> > > > > >> help, not sure by how much though
>> > > > > >>
>> > > > > >>  => I think we should spend further time and propose ways to
>> get
>> > rid
>> > > > of
>> > > > > >> those spikes. The mitigations are not realistic to really be
>> able
>> > to
>> > > > use
>> > > > > >> in
>> > > > > >> practise.
>> > > > > >>
>> > > > > >> *My proposal is:*
>> > > > > >> There is further analysis needed:
>> > > > > >>  - Capture all OpenMeetings calls that happen during the create
>> > room
>> > > > > hash
>> > > > > >> and conference room-enter
>> > > > > >>  - Measure call lengths and any calls during the create room
>> hash
>> > > and
>> > > > > >> conference room-enter and specific CPU spikes or memory usage
>> > based
>> > > > on a
>> > > > > >> per call basis
>> > > > > >>  - Eventually get a stack trace or have a profile available
>> that
>> > > > exports
>> > > > > >> the current in memory objects to review where and what create
>> > those
>> > > > > spikes
>> > > > > >>
>> > > > > >> Once a per-call analysis is there it should be a lot more easy
>> to
>> > > > > pinpoint
>> > > > > >> specific issues and propose improvements.
>> > > > > >>
>> > > > > >> As with all performance optimisation this is likely to need
>> more
>> > > > > >> discussion
>> > > > > >> once more detailed data is available.
>> > > > > >>
>> > > > > >> Thanks,
>> > > > > >> Sebastian
>> > > > > >>
>> > > > > >> Sebastian Wagner
>> > > > > >> Director Arrakeen Solutions, OM-Hosting.com
>> > > > > >> http://arrakeen-solutions.co.nz/
>> > > > > >> https://om-hosting.com - Cloud & Server Hosting for HTML5
>> > > > > >> Video-Conferencing OpenMeetings
>> > > > > >> <
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url
>> > > > > >> >
>> > > > > >> <
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url
>> > > > > >> >
>> > > > > >>
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Best regards,
>> > > > > > Maxim
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Best regards,
>> > > > > Maxim
>> > > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > Best regards,
>> > > Maxim
>> > >
>> >
>>
>>
>> --
>> Best regards,
>> Maxim
>>
>

Re: Performance & Load Testing Results and next steps - Improving OpenMeetings sign in and room enter performance

Reply via email to