Re: Performance & Load Testing Results and next steps - Improving OpenMeetings sign in and room enter performance

[email protected] Tue, 02 Feb 2021 01:13:22 -0800

It says default maxThreads is 200
https://tomcat.apache.org/tomcat-9.0-doc/config/executor.html


So I can try with 400 maybe to double that.

Thanks
Seb

Sebastian Wagner
Director Arrakeen Solutions, OM-Hosting.com
http://arrakeen-solutions.co.nz/
https://om-hosting.com - Cloud & Server Hosting for HTML5
Video-Conferencing OpenMeetings
<https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url>
<https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url>


On Tue, 2 Feb 2021 at 22:11, [email protected] <[email protected]>
wrote:

> I will have a look with 300 and repeat it.
>
>
> BTW are you using dockerized OM? how are you passing `xmx` via
> CATALINA_OPTS
> ?
> => I have a custom Openmeetings docker container and I set those via
> CATALINA_OPS that are passed into the OpenMeetings instance.
> I can see in the cataline.out logs that it reads the values in and uses it.
>
> Are you setting additional memory for docker?
> => The Docker container itself also has 4GB memory available.
>
> If you compare the graphs from the 2GB and 4GB test you can see that
> memory usage in % has dropped by exactly 50%. So it seems pretty convincing
> that those settings are all correctly applied.
>
> Thanks
> Seb
>
> Sebastian Wagner
> Director Arrakeen Solutions, OM-Hosting.com
> http://arrakeen-solutions.co.nz/
> https://om-hosting.com - Cloud & Server Hosting for HTML5
> Video-Conferencing OpenMeetings
>
> <https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url>
> <https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url>
>
>
> On Tue, 2 Feb 2021 at 22:04, Maxim Solodovnik <[email protected]>
> wrote:
>
>> the default is 150
>> could you set to 300?
>> we will see is there will be improvement
>>
>> BTW are you using dockerized OM? how are you passing `xmx` via
>> CATALINA_OPTS
>> ?
>> Are you setting additional memory for docker?
>>
>> On Tue, 2 Feb 2021 at 16:00, [email protected] <[email protected]
>> >
>> wrote:
>>
>> > I can try and re-run, how many would you recommend worth trying for this
>> > scenario ?
>> >
>> > Thanks
>> > Seb
>> >
>> > Sebastian Wagner
>> > Director Arrakeen Solutions, OM-Hosting.com
>> > http://arrakeen-solutions.co.nz/
>> > https://om-hosting.com - Cloud & Server Hosting for HTML5
>> > Video-Conferencing OpenMeetings
>> > <
>> >
>> https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url
>> > >
>> > <
>> >
>> https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url
>> > >
>> >
>> >
>> > On Tue, 2 Feb 2021 at 21:56, Maxim Solodovnik <[email protected]>
>> > wrote:
>> >
>> > > Have you tried to increase maxThreads for Tomcat?
>> > >
>> > > On Tue, 2 Feb 2021 at 15:26, [email protected] <
>> > [email protected]>
>> > > wrote:
>> > >
>> > > > I doubled it to 4GB OpenMeetings and 4GB KMS. I updated the docker
>> > > instance
>> > > > to run Openmeetings with xms=2GB and Xmx=4GB.
>> > > >
>> > > > And I did run exactly the same test again:
>> > > >  - 50-60 users
>> > > >  - staggered to enter in a time period around 5-10min
>> > > >  - distributed into 10 conference rooms 4x4 and 2 webinars with 20
>> > users
>> > > > each
>> > > >  - each test runs calls the API to login/createRoomHash and then
>> load
>> > the
>> > > > URL with the room (plus start webcam/audio stream in the conference
>> > > rooms)
>> > > >
>> > > > The results look almost the same. There is hardly any improvement:
>> > > >
>> > > >    - CPU still spikes to almost 100%, memory is not a problem
>> > > >    - Empty video pods as well as video pods where webcam stream
>> didn't
>> > > > start
>> > > >
>> > > > There isn't a crash, but that is mostly because I stagger it to
>> enter
>> > the
>> > > > server over a 5-10min period. Which didn't crash the 2GB instance
>> > either.
>> > > >
>> > > > Comparison of the CPU graphs of both hardware configuration and test
>> > > runs:
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/OPENMEETINGS/Performance+Testing#PerformanceTesting-ClusterPerformancetestresult02-022021
>> > > >
>> > > > There is pretty much no improvement.
>> > > >
>> > > > There is some work on the application side needed. This does not
>> look
>> > > like
>> > > > getting better by throwing more hardware at it.
>> > > >
>> > > > It is really quite limiting to have no logs about any sort of
>> > performance
>> > > > indicators like call length to narrow down where the bottleneck is.
>> > > > You may find some very low hanging fruits in terms of optimisation
>> if
>> > you
>> > > > can simply concentrate on the top ten calls and optimise those.
>> > > > Rather than looking at CPU and memory graphs.
>> > > >
>> > > > Thanks
>> > > > Sebastian
>> > > >
>> > > > Sebastian Wagner
>> > > > Director Arrakeen Solutions, OM-Hosting.com
>> > > > http://arrakeen-solutions.co.nz/
>> > > > https://om-hosting.com - Cloud & Server Hosting for HTML5
>> > > > Video-Conferencing OpenMeetings
>> > > > <
>> > > >
>> > >
>> >
>> https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url
>> > > > >
>> > > > <
>> > > >
>> > >
>> >
>> https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url
>> > > > >
>> > > >
>> > > >
>> > > > On Tue, 2 Feb 2021 at 17:18, [email protected] <
>> > > [email protected]>
>> > > > wrote:
>> > > >
>> > > > > Have we ever looked into which java method would require the most
>> > > > > resources/time during the process of entering the conference room
>> ?
>> > > > >
>> > > > > Sebastian Wagner
>> > > > > Director Arrakeen Solutions, OM-Hosting.com
>> > > > > http://arrakeen-solutions.co.nz/
>> > > > > https://om-hosting.com - Cloud & Server Hosting for HTML5
>> > > > > Video-Conferencing OpenMeetings
>> > > > >
>> > > > > <
>> > > >
>> > >
>> >
>> https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url
>> > > > >
>> > > > > <
>> > > >
>> > >
>> >
>> https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Tue, 2 Feb 2021 at 16:48, Maxim Solodovnik <
>> [email protected]>
>> > > > > wrote:
>> > > > >
>> > > > >> While do load testing I did the following:
>> > > > >>
>> > > > >> create Jmeter test loading "semistatic" stateless error page with
>> > 300
>> > > > >> simultaneous threads (I can share this test it is very simple)
>> > > > >> CPU usage of OM process was near to 100%
>> > > > >> the situation is better if Tomcat has more threads (maxThread
>> > > parameter)
>> > > > >>
>> > > > >> I guess we need to check "The Ultimate Tomcat Performace Guide"
>> :)))
>> > > > >>
>> > > > >> On Tue, 2 Feb 2021 at 10:41, [email protected] <
>> > > > [email protected]
>> > > > >> >
>> > > > >> wrote:
>> > > > >>
>> > > > >> > Also the spikes are on the CPU actually more than on the
>> memory:
>> > > > >> >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/OPENMEETINGS/Performance+Testing#PerformanceTesting-ClusterPerformancetestresult02-022021
>> > > > >> >
>> > > > >> > The spike is just 50-60 users.
>> > > > >> >
>> > > > >> > Why would CPU spike to almost 100% just for that amount of
>> users ?
>> > > > >> >
>> > > > >> > I can try with 4GB for Openmeetings and repeat the test.
>> > > > >> >
>> > > > >> > Thanks
>> > > > >> > Seb
>> > > > >> >
>> > > > >> > Sebastian Wagner
>> > > > >> > Director Arrakeen Solutions, OM-Hosting.com
>> > > > >> > http://arrakeen-solutions.co.nz/
>> > > > >> > https://om-hosting.com - Cloud & Server Hosting for HTML5
>> > > > >> > Video-Conferencing OpenMeetings
>> > > > >> > <
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url
>> > > > >> > >
>> > > > >> > <
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url
>> > > > >> > >
>> > > > >> >
>> > > > >> >
>> > > > >> > On Tue, 2 Feb 2021 at 16:34, Maxim Solodovnik <
>> > [email protected]
>> > > >
>> > > > >> > wrote:
>> > > > >> >
>> > > > >> > > On Tue, 2 Feb 2021 at 10:30, [email protected] <
>> > > > >> > [email protected]>
>> > > > >> > > wrote:
>> > > > >> > >
>> > > > >> > > > I think what you mean is you have OpenMeetings and MySQL
>> and
>> > KMS
>> > > > on
>> > > > >> one
>> > > > >> > > > instance with 4GB.
>> > > > >> > > >
>> > > > >> > > > But its 2GB Just for OpenMeetings.
>> > > > >> > > >
>> > > > >> > >
>> > > > >> > > I mean
>> > > > >> > > 4GB just for OM (demo-next)
>> > > > >> > > 8GB just for OM (demo-prod)
>> > > > >> > > and this might need to be increased in case of many users
>> > > > >> > >
>> > > > >> > > Additionally Tomcat's maxThreads might need to be increased
>> > here:
>> > > > >> > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://github.com/apache/openmeetings/blob/master/openmeetings-server/src/main/assembly/conf/server.xml#L74
>> > > > >> > >
>> > > > >> > > I suspect lot's of simultaneous users need more resources
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > KMS is separated with another 2GB
>> > > > >> > > > MySQL is on another server with another 2GB
>> > > > >> > > > So that would be 6GB in total. But only 2 are allocated to
>> > > > >> > OpenMeetings.
>> > > > >> > > >
>> > > > >> > > > XmX=2GB for OpenMeetings should be enough and not crash
>> with
>> > > 50-60
>> > > > >> > users
>> > > > >> > > > entering the room at the same time.
>> > > > >> > > >
>> > > > >> > > > Thanks
>> > > > >> > > > Sebastian
>> > > > >> > > >
>> > > > >> > > > Sebastian Wagner
>> > > > >> > > > Director Arrakeen Solutions, OM-Hosting.com
>> > > > >> > > > http://arrakeen-solutions.co.nz/
>> > > > >> > > > https://om-hosting.com - Cloud & Server Hosting for HTML5
>> > > > >> > > > Video-Conferencing OpenMeetings
>> > > > >> > > > <
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url
>> > > > >> > > > >
>> > > > >> > > > <
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > > > On Tue, 2 Feb 2021 at 16:26, Maxim Solodovnik <
>> > > > [email protected]
>> > > > >> >
>> > > > >> > > > wrote:
>> > > > >> > > >
>> > > > >> > > > > Hello Sebastian,
>> > > > >> > > > >
>> > > > >> > > > > It seems 2GB of RAM is not enough for OM
>> > > > >> > > > >       `OutOfMemoryError: Container killed due to memory
>> > usage`
>> > > > >> > > > > I never use less than 4GB (8-16GB in production)
>> > > > >> > > > >
>> > > > >> > > > >
>> > > > >> > > > >
>> > > > >> > > > > On Tue, 2 Feb 2021 at 09:54, Maxim Solodovnik <
>> > > > >> [email protected]>
>> > > > >> > > > > wrote:
>> > > > >> > > > >
>> > > > >> > > > > >
>> > > > >> > > > > >
>> > > > >> > > > > > On Tue, 2 Feb 2021 at 07:23, [email protected] <
>> > > > >> > > > > [email protected]>
>> > > > >> > > > > > wrote:
>> > > > >> > > > > >
>> > > > >> > > > > >> Hi,
>> > > > >> > > > > >>
>> > > > >> > > > > >> I have been conducting a few more performance and load
>> > > tests
>> > > > >> with
>> > > > >> > > the
>> > > > >> > > > > goal
>> > > > >> > > > > >> of increasing participants to 100++.
>> > > > >> > > > > >>
>> > > > >> > > > > >> The challenge is:
>> > > > >> > > > > >> *If more then 50-60 users dynamically create a room
>> Hash
>> > > > (using
>> > > > >> > > > > Soap/Rest
>> > > > >> > > > > >> API) and use that Hash to enter the conference room
>> CPU
>> > and
>> > > > >> memory
>> > > > >> > > > > spikes
>> > > > >> > > > > >> and server crashes*
>> > > > >> > > > > >>
>> > > > >> > > > > >
>> > > > >> > > > > > Can you share API call sequence?
>> > > > >> > > > > > Maybe we can write JMeter scenario for this?
>> > > > >> > > > > >
>> > > > >> > > > > > server crash is something bad
>> > > > >> > > > > > What is happening? Is it a JVM crash? Or is the system
>> low
>> > > of
>> > > > >> > > resources
>> > > > >> > > > > > and the kernel kills the trouble-maker?
>> > > > >> > > > > >
>> > > > >> > > > > >
>> > > > >> > > > > >> *Test scenario observations:*
>> > > > >> > > > > >>  - It does not matter if those users try to enter the
>> > same
>> > > > >> room or
>> > > > >> > > > > >> separate
>> > > > >> > > > > >> rooms. In the above test scenario it's a mix of 4x4
>> > > > conference
>> > > > >> > rooms
>> > > > >> > > > and
>> > > > >> > > > > >> 20x1 webinars
>> > > > >> > > > > >>  - This can be reproduced stable and repetitively
>> > > > >> > > > > >>  - The issue starts with API calls taking 10sec++ and
>> > > getting
>> > > > >> more
>> > > > >> > > > > slower.
>> > > > >> > > > > >> Until the OpenMeetings Tomcat instance crashes
>> > > > >> > > > > >>  - The issue also manifests that -BEFORE- the server
>> > > crashes
>> > > > >> you
>> > > > >> > can
>> > > > >> > > > see
>> > > > >> > > > > >> video pods not completing the initialisation in the
>> > > > conference
>> > > > >> > room
>> > > > >> > > > > >> itself.
>> > > > >> > > > > >> For example missing video pods or video pods without a
>> > > webcam
>> > > > >> > > stream.
>> > > > >> > > > > >> Likely to be linked to slow running API or web-socket
>> > calls
>> > > > >> > > > > >> => I can deliver data samples or screenshots if
>> required
>> > > via
>> > > > >> our
>> > > > >> > > > > >> confluence
>> > > > >> > > > > >> space.
>> > > > >> > > > > >>
>> > > > >> > > > > >> *Hardware and software:*
>> > > > >> > > > > >>  - Server and OpenMeetings Instance is isolated on a
>> > > > separated
>> > > > >> > > > hardware
>> > > > >> > > > > >> and
>> > > > >> > > > > >> has 2GB of memory allocated
>> > > > >> > > > > >>  - There is no spike on KMS or Database
>> > > hardware/CPU/memory.
>> > > > >> The
>> > > > >> > > spike
>> > > > >> > > > > is
>> > > > >> > > > > >> only in the OpenMeetings Tomcat Server instance
>> > > > >> > > > > >>
>> > > > >> > > > > >> *Possible ways to mitigate without code changes:*
>> > > > >> > > > > >>  - You can mitigate part of this issue if you spread
>> the
>> > > > users
>> > > > >> to
>> > > > >> > > > enter
>> > > > >> > > > > >> over a longer time period. However it needs more than
>> > 10min
>> > > > >> > > separation
>> > > > >> > > > > to
>> > > > >> > > > > >> enter without issues for 50-60 participants
>> > > > >> > > > > >>  - You can mitigate part of this issue if you for
>> example
>> > > > >> create
>> > > > >> > the
>> > > > >> > > > > >> room-hash in a different process (like 1h before
>> using)
>> > and
>> > > > >> once
>> > > > >> > all
>> > > > >> > > > > >> hashes
>> > > > >> > > > > >> are created you enter the conference room. It still
>> leads
>> > > to
>> > > > >> > issues,
>> > > > >> > > > but
>> > > > >> > > > > >> you can enter up to 100 users within 5-10min, if you
>> just
>> > > use
>> > > > >> the
>> > > > >> > > > links,
>> > > > >> > > > > >> rather than create the link AND entering with the
>> link at
>> > > the
>> > > > >> same
>> > > > >> > > > > >> time/process
>> > > > >> > > > > >>  - Increasing Tomcat to more than 2GB of memory per
>> > Tomcat
>> > > > >> > instance
>> > > > >> > > > may
>> > > > >> > > > > >> help, not sure by how much though
>> > > > >> > > > > >>
>> > > > >> > > > > >>  => I think we should spend further time and propose
>> ways
>> > > to
>> > > > >> get
>> > > > >> > rid
>> > > > >> > > > of
>> > > > >> > > > > >> those spikes. The mitigations are not realistic to
>> really
>> > > be
>> > > > >> able
>> > > > >> > to
>> > > > >> > > > use
>> > > > >> > > > > >> in
>> > > > >> > > > > >> practise.
>> > > > >> > > > > >>
>> > > > >> > > > > >> *My proposal is:*
>> > > > >> > > > > >> There is further analysis needed:
>> > > > >> > > > > >>  - Capture all OpenMeetings calls that happen during
>> the
>> > > > create
>> > > > >> > room
>> > > > >> > > > > hash
>> > > > >> > > > > >> and conference room-enter
>> > > > >> > > > > >>  - Measure call lengths and any calls during the
>> create
>> > > room
>> > > > >> hash
>> > > > >> > > and
>> > > > >> > > > > >> conference room-enter and specific CPU spikes or
>> memory
>> > > usage
>> > > > >> > based
>> > > > >> > > > on a
>> > > > >> > > > > >> per call basis
>> > > > >> > > > > >>  - Eventually get a stack trace or have a profile
>> > available
>> > > > >> that
>> > > > >> > > > exports
>> > > > >> > > > > >> the current in memory objects to review where and what
>> > > create
>> > > > >> > those
>> > > > >> > > > > spikes
>> > > > >> > > > > >>
>> > > > >> > > > > >> Once a per-call analysis is there it should be a lot
>> more
>> > > > easy
>> > > > >> to
>> > > > >> > > > > pinpoint
>> > > > >> > > > > >> specific issues and propose improvements.
>> > > > >> > > > > >>
>> > > > >> > > > > >> As with all performance optimisation this is likely to
>> > need
>> > > > >> more
>> > > > >> > > > > >> discussion
>> > > > >> > > > > >> once more detailed data is available.
>> > > > >> > > > > >>
>> > > > >> > > > > >> Thanks,
>> > > > >> > > > > >> Sebastian
>> > > > >> > > > > >>
>> > > > >> > > > > >> Sebastian Wagner
>> > > > >> > > > > >> Director Arrakeen Solutions, OM-Hosting.com
>> > > > >> > > > > >> http://arrakeen-solutions.co.nz/
>> > > > >> > > > > >> https://om-hosting.com - Cloud & Server Hosting for
>> > HTML5
>> > > > >> > > > > >> Video-Conferencing OpenMeetings
>> > > > >> > > > > >> <
>> > > > >> > > > > >>
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url
>> > > > >> > > > > >> >
>> > > > >> > > > > >> <
>> > > > >> > > > > >>
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url
>> > > > >> > > > > >> >
>> > > > >> > > > > >>
>> > > > >> > > > > >
>> > > > >> > > > > >
>> > > > >> > > > > > --
>> > > > >> > > > > > Best regards,
>> > > > >> > > > > > Maxim
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > > >
>> > > > >> > > > > --
>> > > > >> > > > > Best regards,
>> > > > >> > > > > Maxim
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > --
>> > > > >> > > Best regards,
>> > > > >> > > Maxim
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >>
>> > > > >> --
>> > > > >> Best regards,
>> > > > >> Maxim
>> > > > >>
>> > > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > Best regards,
>> > > Maxim
>> > >
>> >
>>
>>
>> --
>> Best regards,
>> Maxim
>>
>

Re: Performance & Load Testing Results and next steps - Improving OpenMeetings sign in and room enter performance

Reply via email to