Re: Performance & Load Testing Results and next steps - Improving OpenMeetings sign in and room enter performance

[email protected] Tue, 02 Feb 2021 01:00:59 -0800

I can try and re-run, how many would you recommend worth trying for this
scenario ?


Thanks
Seb

Sebastian Wagner
Director Arrakeen Solutions, OM-Hosting.com
http://arrakeen-solutions.co.nz/
https://om-hosting.com - Cloud & Server Hosting for HTML5
Video-Conferencing OpenMeetings
<https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url>
<https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url>


On Tue, 2 Feb 2021 at 21:56, Maxim Solodovnik <[email protected]> wrote:

> Have you tried to increase maxThreads for Tomcat?
>
> On Tue, 2 Feb 2021 at 15:26, [email protected] <[email protected]>
> wrote:
>
> > I doubled it to 4GB OpenMeetings and 4GB KMS. I updated the docker
> instance
> > to run Openmeetings with xms=2GB and Xmx=4GB.
> >
> > And I did run exactly the same test again:
> >  - 50-60 users
> >  - staggered to enter in a time period around 5-10min
> >  - distributed into 10 conference rooms 4x4 and 2 webinars with 20 users
> > each
> >  - each test runs calls the API to login/createRoomHash and then load the
> > URL with the room (plus start webcam/audio stream in the conference
> rooms)
> >
> > The results look almost the same. There is hardly any improvement:
> >
> >    - CPU still spikes to almost 100%, memory is not a problem
> >    - Empty video pods as well as video pods where webcam stream didn't
> > start
> >
> > There isn't a crash, but that is mostly because I stagger it to enter the
> > server over a 5-10min period. Which didn't crash the 2GB instance either.
> >
> > Comparison of the CPU graphs of both hardware configuration and test
> runs:
> >
> >
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/Performance+Testing#PerformanceTesting-ClusterPerformancetestresult02-022021
> >
> > There is pretty much no improvement.
> >
> > There is some work on the application side needed. This does not look
> like
> > getting better by throwing more hardware at it.
> >
> > It is really quite limiting to have no logs about any sort of performance
> > indicators like call length to narrow down where the bottleneck is.
> > You may find some very low hanging fruits in terms of optimisation if you
> > can simply concentrate on the top ten calls and optimise those.
> > Rather than looking at CPU and memory graphs.
> >
> > Thanks
> > Sebastian
> >
> > Sebastian Wagner
> > Director Arrakeen Solutions, OM-Hosting.com
> > http://arrakeen-solutions.co.nz/
> > https://om-hosting.com - Cloud & Server Hosting for HTML5
> > Video-Conferencing OpenMeetings
> > <
> >
> https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url
> > >
> > <
> >
> https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url
> > >
> >
> >
> > On Tue, 2 Feb 2021 at 17:18, [email protected] <
> [email protected]>
> > wrote:
> >
> > > Have we ever looked into which java method would require the most
> > > resources/time during the process of entering the conference room ?
> > >
> > > Sebastian Wagner
> > > Director Arrakeen Solutions, OM-Hosting.com
> > > http://arrakeen-solutions.co.nz/
> > > https://om-hosting.com - Cloud & Server Hosting for HTML5
> > > Video-Conferencing OpenMeetings
> > >
> > > <
> >
> https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url
> > >
> > > <
> >
> https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url
> > >
> > >
> > >
> > > On Tue, 2 Feb 2021 at 16:48, Maxim Solodovnik <[email protected]>
> > > wrote:
> > >
> > >> While do load testing I did the following:
> > >>
> > >> create Jmeter test loading "semistatic" stateless error page with 300
> > >> simultaneous threads (I can share this test it is very simple)
> > >> CPU usage of OM process was near to 100%
> > >> the situation is better if Tomcat has more threads (maxThread
> parameter)
> > >>
> > >> I guess we need to check "The Ultimate Tomcat Performace Guide" :)))
> > >>
> > >> On Tue, 2 Feb 2021 at 10:41, [email protected] <
> > [email protected]
> > >> >
> > >> wrote:
> > >>
> > >> > Also the spikes are on the CPU actually more than on the memory:
> > >> >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/Performance+Testing#PerformanceTesting-ClusterPerformancetestresult02-022021
> > >> >
> > >> > The spike is just 50-60 users.
> > >> >
> > >> > Why would CPU spike to almost 100% just for that amount of users ?
> > >> >
> > >> > I can try with 4GB for Openmeetings and repeat the test.
> > >> >
> > >> > Thanks
> > >> > Seb
> > >> >
> > >> > Sebastian Wagner
> > >> > Director Arrakeen Solutions, OM-Hosting.com
> > >> > http://arrakeen-solutions.co.nz/
> > >> > https://om-hosting.com - Cloud & Server Hosting for HTML5
> > >> > Video-Conferencing OpenMeetings
> > >> > <
> > >> >
> > >>
> >
> https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url
> > >> > >
> > >> > <
> > >> >
> > >>
> >
> https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url
> > >> > >
> > >> >
> > >> >
> > >> > On Tue, 2 Feb 2021 at 16:34, Maxim Solodovnik <[email protected]
> >
> > >> > wrote:
> > >> >
> > >> > > On Tue, 2 Feb 2021 at 10:30, [email protected] <
> > >> > [email protected]>
> > >> > > wrote:
> > >> > >
> > >> > > > I think what you mean is you have OpenMeetings and MySQL and KMS
> > on
> > >> one
> > >> > > > instance with 4GB.
> > >> > > >
> > >> > > > But its 2GB Just for OpenMeetings.
> > >> > > >
> > >> > >
> > >> > > I mean
> > >> > > 4GB just for OM (demo-next)
> > >> > > 8GB just for OM (demo-prod)
> > >> > > and this might need to be increased in case of many users
> > >> > >
> > >> > > Additionally Tomcat's maxThreads might need to be increased here:
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/openmeetings/blob/master/openmeetings-server/src/main/assembly/conf/server.xml#L74
> > >> > >
> > >> > > I suspect lot's of simultaneous users need more resources
> > >> > >
> > >> > >
> > >> > > KMS is separated with another 2GB
> > >> > > > MySQL is on another server with another 2GB
> > >> > > > So that would be 6GB in total. But only 2 are allocated to
> > >> > OpenMeetings.
> > >> > > >
> > >> > > > XmX=2GB for OpenMeetings should be enough and not crash with
> 50-60
> > >> > users
> > >> > > > entering the room at the same time.
> > >> > > >
> > >> > > > Thanks
> > >> > > > Sebastian
> > >> > > >
> > >> > > > Sebastian Wagner
> > >> > > > Director Arrakeen Solutions, OM-Hosting.com
> > >> > > > http://arrakeen-solutions.co.nz/
> > >> > > > https://om-hosting.com - Cloud & Server Hosting for HTML5
> > >> > > > Video-Conferencing OpenMeetings
> > >> > > > <
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url
> > >> > > > >
> > >> > > > <
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > On Tue, 2 Feb 2021 at 16:26, Maxim Solodovnik <
> > [email protected]
> > >> >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hello Sebastian,
> > >> > > > >
> > >> > > > > It seems 2GB of RAM is not enough for OM
> > >> > > > >       `OutOfMemoryError: Container killed due to memory usage`
> > >> > > > > I never use less than 4GB (8-16GB in production)
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > On Tue, 2 Feb 2021 at 09:54, Maxim Solodovnik <
> > >> [email protected]>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Tue, 2 Feb 2021 at 07:23, [email protected] <
> > >> > > > > [email protected]>
> > >> > > > > > wrote:
> > >> > > > > >
> > >> > > > > >> Hi,
> > >> > > > > >>
> > >> > > > > >> I have been conducting a few more performance and load
> tests
> > >> with
> > >> > > the
> > >> > > > > goal
> > >> > > > > >> of increasing participants to 100++.
> > >> > > > > >>
> > >> > > > > >> The challenge is:
> > >> > > > > >> *If more then 50-60 users dynamically create a room Hash
> > (using
> > >> > > > > Soap/Rest
> > >> > > > > >> API) and use that Hash to enter the conference room CPU and
> > >> memory
> > >> > > > > spikes
> > >> > > > > >> and server crashes*
> > >> > > > > >>
> > >> > > > > >
> > >> > > > > > Can you share API call sequence?
> > >> > > > > > Maybe we can write JMeter scenario for this?
> > >> > > > > >
> > >> > > > > > server crash is something bad
> > >> > > > > > What is happening? Is it a JVM crash? Or is the system low
> of
> > >> > > resources
> > >> > > > > > and the kernel kills the trouble-maker?
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >> *Test scenario observations:*
> > >> > > > > >>  - It does not matter if those users try to enter the same
> > >> room or
> > >> > > > > >> separate
> > >> > > > > >> rooms. In the above test scenario it's a mix of 4x4
> > conference
> > >> > rooms
> > >> > > > and
> > >> > > > > >> 20x1 webinars
> > >> > > > > >>  - This can be reproduced stable and repetitively
> > >> > > > > >>  - The issue starts with API calls taking 10sec++ and
> getting
> > >> more
> > >> > > > > slower.
> > >> > > > > >> Until the OpenMeetings Tomcat instance crashes
> > >> > > > > >>  - The issue also manifests that -BEFORE- the server
> crashes
> > >> you
> > >> > can
> > >> > > > see
> > >> > > > > >> video pods not completing the initialisation in the
> > conference
> > >> > room
> > >> > > > > >> itself.
> > >> > > > > >> For example missing video pods or video pods without a
> webcam
> > >> > > stream.
> > >> > > > > >> Likely to be linked to slow running API or web-socket calls
> > >> > > > > >> => I can deliver data samples or screenshots if required
> via
> > >> our
> > >> > > > > >> confluence
> > >> > > > > >> space.
> > >> > > > > >>
> > >> > > > > >> *Hardware and software:*
> > >> > > > > >>  - Server and OpenMeetings Instance is isolated on a
> > separated
> > >> > > > hardware
> > >> > > > > >> and
> > >> > > > > >> has 2GB of memory allocated
> > >> > > > > >>  - There is no spike on KMS or Database
> hardware/CPU/memory.
> > >> The
> > >> > > spike
> > >> > > > > is
> > >> > > > > >> only in the OpenMeetings Tomcat Server instance
> > >> > > > > >>
> > >> > > > > >> *Possible ways to mitigate without code changes:*
> > >> > > > > >>  - You can mitigate part of this issue if you spread the
> > users
> > >> to
> > >> > > > enter
> > >> > > > > >> over a longer time period. However it needs more than 10min
> > >> > > separation
> > >> > > > > to
> > >> > > > > >> enter without issues for 50-60 participants
> > >> > > > > >>  - You can mitigate part of this issue if you for example
> > >> create
> > >> > the
> > >> > > > > >> room-hash in a different process (like 1h before using) and
> > >> once
> > >> > all
> > >> > > > > >> hashes
> > >> > > > > >> are created you enter the conference room. It still leads
> to
> > >> > issues,
> > >> > > > but
> > >> > > > > >> you can enter up to 100 users within 5-10min, if you just
> use
> > >> the
> > >> > > > links,
> > >> > > > > >> rather than create the link AND entering with the link at
> the
> > >> same
> > >> > > > > >> time/process
> > >> > > > > >>  - Increasing Tomcat to more than 2GB of memory per Tomcat
> > >> > instance
> > >> > > > may
> > >> > > > > >> help, not sure by how much though
> > >> > > > > >>
> > >> > > > > >>  => I think we should spend further time and propose ways
> to
> > >> get
> > >> > rid
> > >> > > > of
> > >> > > > > >> those spikes. The mitigations are not realistic to really
> be
> > >> able
> > >> > to
> > >> > > > use
> > >> > > > > >> in
> > >> > > > > >> practise.
> > >> > > > > >>
> > >> > > > > >> *My proposal is:*
> > >> > > > > >> There is further analysis needed:
> > >> > > > > >>  - Capture all OpenMeetings calls that happen during the
> > create
> > >> > room
> > >> > > > > hash
> > >> > > > > >> and conference room-enter
> > >> > > > > >>  - Measure call lengths and any calls during the create
> room
> > >> hash
> > >> > > and
> > >> > > > > >> conference room-enter and specific CPU spikes or memory
> usage
> > >> > based
> > >> > > > on a
> > >> > > > > >> per call basis
> > >> > > > > >>  - Eventually get a stack trace or have a profile available
> > >> that
> > >> > > > exports
> > >> > > > > >> the current in memory objects to review where and what
> create
> > >> > those
> > >> > > > > spikes
> > >> > > > > >>
> > >> > > > > >> Once a per-call analysis is there it should be a lot more
> > easy
> > >> to
> > >> > > > > pinpoint
> > >> > > > > >> specific issues and propose improvements.
> > >> > > > > >>
> > >> > > > > >> As with all performance optimisation this is likely to need
> > >> more
> > >> > > > > >> discussion
> > >> > > > > >> once more detailed data is available.
> > >> > > > > >>
> > >> > > > > >> Thanks,
> > >> > > > > >> Sebastian
> > >> > > > > >>
> > >> > > > > >> Sebastian Wagner
> > >> > > > > >> Director Arrakeen Solutions, OM-Hosting.com
> > >> > > > > >> http://arrakeen-solutions.co.nz/
> > >> > > > > >> https://om-hosting.com - Cloud & Server Hosting for HTML5
> > >> > > > > >> Video-Conferencing OpenMeetings
> > >> > > > > >> <
> > >> > > > > >>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url
> > >> > > > > >> >
> > >> > > > > >> <
> > >> > > > > >>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url
> > >> > > > > >> >
> > >> > > > > >>
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > --
> > >> > > > > > Best regards,
> > >> > > > > > Maxim
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > Best regards,
> > >> > > > > Maxim
> > >> > > > >
> > >> > > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Best regards,
> > >> > > Maxim
> > >> > >
> > >> >
> > >>
> > >>
> > >> --
> > >> Best regards,
> > >> Maxim
> > >>
> > >
> >
>
>
> --
> Best regards,
> Maxim
>

Re: Performance & Load Testing Results and next steps - Improving OpenMeetings sign in and room enter performance

Reply via email to