Re: OpenMeetings 80 and 140 users performance test results

[email protected] Thu, 04 Feb 2021 21:52:25 -0800

my password was 100% correct. Cause it a test script I run. There wasn't a
single error during login.


And the method got invoked exactly 80 times (or 140 times in the bigger
test) (you can check my dashboards:))

You can see the number of calls and the rate in this graph (for the 140
users test):
http://54.162.44.21:5080/graph?g0.expr=sum(increase(webapp_metrics_filter_count%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2Fuser%2F.%2B%22%7D%5B1m%5D))&g0.tab=0&g0.stacked=0&g0.range_input=15m&g0.end_input=2021-02-05%2002%3A52%3A00&g0.moment_input=2021-02-05%2002%3A52%3A00

Will invoking this method 140 times in a 5min period make the CPU spike to
95% and lead to 10seconds call length ? :)

Also like I say, I have the same metric on the UserDao::login => And it
looks the same.

I don't think so.

Thanks
Seb


Sebastian Wagner
Director Arrakeen Solutions, OM-Hosting.com
http://arrakeen-solutions.co.nz/
https://om-hosting.com - Cloud & Server Hosting for HTML5
Video-Conferencing OpenMeetings
<https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url>
<https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url>


On Fri, 5 Feb 2021 at 18:44, Maxim Solodovnik <[email protected]> wrote:

> Can it be password hash function?
> It should be slow (for security reasons)
> And It should be even slower if wrong password was provided
>
> On Fri, 5 Feb 2021 at 12:42, [email protected] <[email protected]>
> wrote:
>
> > It is login via API: /services/users/login/
> >
> > But: I've put the metric also on the UserDao::login and the timings look
> > identical.
> >
> > So it would matter how you login I think.
> >
> > It just seems so strange that this method takes so long while RoomDao and
> > others don't behave that way.
> >
> > Thanks
> > Seb
> >
> > Sebastian Wagner
> > Director Arrakeen Solutions, OM-Hosting.com
> > http://arrakeen-solutions.co.nz/
> > https://om-hosting.com - Cloud & Server Hosting for HTML5
> > Video-Conferencing OpenMeetings
> > <
> >
> https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url
> > >
> > <
> >
> https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url
> > >
> >
> >
> > On Fri, 5 Feb 2021 at 18:38, Maxim Solodovnik <[email protected]>
> > wrote:
> >
> > > sorry for top posting
> > >
> > > will re-read more carefully later
> > >
> > > What I would like to do as first step:
> > > 1) add your index to DB
> > > 2) analyze your your login results and write Unit test for login
> > > was it login by hash or login by username/password?
> > >
> > >
> > > On Fri, 5 Feb 2021 at 12:28, [email protected] <
> > [email protected]>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > as you noticed I added a branch with performance metrics. Add did
> rerun
> > > > tests with similar and large user numbers.
> > > >
> > > > Results
> > > >
> > > >    - *80 users test*
> > > >    test:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test
> > > >    <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test
> > > > >
> > > >    - *140 users test*
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+140+users+test
> > > >
> > > > Findings
> > > >
> > > > *1 - Added index in address.email*
> > > >
> > > > I actually did the *80 users test* twice. I found that I could
> improve
> > > the
> > > > performance of the login command by adding some index into
> > address.email
> > > > See:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test#OpenMeetings80userstest-CPUandmemoryusage
> > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test#OpenMeetings80userstest-CPUandmemoryusage
> > > > >
> > > >
> > > > *2 - Login command performance increasingly bad with number of users*
> > > >
> > > > While I could improve some of the UserService::login (or generally
> > > > UserDao::login) with the index, *then I switched from 80 users to 140
> > > > users*
> > > > and rerun the tests: *The login command started to take 10+ seconds*
> > > >
> > > >    - I also used an actual log command to verify my metrics because
> it
> > > just
> > > >    seemed so strange !
> > > >    - I also checked other methods, they are NOT affected. Or at least
> > by
> > > a
> > > >    very far margin increase very slightly. It's NOT generally slow,
> but
> > > > some
> > > >    part of the application is slow!
> > > >    - You can see how the duration increases in length here during the
> > > test
> > > >    run:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+140+users+test#OpenMeetings140userstest-Loginwebservicecallhits10seconds
> > > >    - And you can see here that its not just the HTTP call that takes
> > long
> > > >    but the actual UserDao:login command:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+140+users+test#OpenMeetings140userstest-DatabaseUserDaologinmethodisbyfarthelongestrunningoneandhitsplus10seconds
> > > >
> > > > => Now this is kind of puzzling. Other methods don't perform that
> > > > increasingly bad, its just this login command. Like RoomDao::method
> do
> > > not
> > > > perform badly. Or not at that scale (by far not).
> > > >
> > > > *Questions:*
> > > >
> > > >    - Is it possible that the login method that fetches the entire
> user
> > > has
> > > >    too many entities linked? SO that even a simple fetch or login
> > starts
> > > to
> > > >    take a long time quickly ?
> > > >    - Is it really possible that this "only this method" takes so long
> > or
> > > am
> > > >    I measuring a general issue because the CPU spike is at 95% ?
> Like I
> > > > say:
> > > >    RoomDao seems fine, are there other methods I should measure ?
> > > >
> > > > *3 - Tomcat threads unused or way below max*
> > > >
> > > > As you can see between those two graphs:
> > > >
> > > >    - 80 users test active threads:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test#OpenMeetings80userstest-Tomcatactivethreads
> > > >    - 140 users test active threads:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+140+users+test#OpenMeetings140userstest-Tomcatactivethreads
> > > >
> > > > => 140 users utilize more threads. YES. But its WAY below the 400
> > > > available. Not even closely to that number. I don't think its a
> > threading
> > > > issue. It also makes sense. Cause given the test scenario you
> probably
> > > have
> > > > around 20-40 users hitting the server trying to login and enter the
> > room
> > > at
> > > > the same time.
> > > >
> > > > *4 - RoomPanel:onInitialise and RoomPanel:enterRoom*
> > > >
> > > > I put some metrics on those methods, you can see the results of the
> 140
> > > > users test run here:
> > > >
> > > >
> > >
> >
> http://54.162.44.21:5080/graph?g0.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22application%22%2Cclass%3D%22RoomPanel%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22application%22%2Cclass%3D%22RoomPanel%22%7D%5B1m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=15m&g0.end_input=2021-02-05%2002%3A52%3A00&g0.moment_input=2021-02-05%2002%3A52%3A00
> > > >  => It is definitely getting worse BUT this is MILLISECONDS. It
> > increased
> > > > 0.05seconds to 0.3seconds. That's getting slower but not that bad.
> > > >
> > > > If you compare the same graph with the login command:
> > > >
> > > >
> > >
> >
> http://54.162.44.21:5080/graph?g0.expr=rate(webapp_metrics_filter_sum%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2Fuser%2F.%2B%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(webapp_metrics_filter_count%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2Fuser%2F.%2B%22%7D%5B1m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=15m&g0.end_input=2021-02-05%2002%3A52%3A00&g0.moment_input=2021-02-05%2002%3A52%3A00
> > > >  => That is going from ~800ms to 12seconds!
> > > >
> > > > It shows some similarity in the curve. But the durations are very
> > > > different!
> > > >
> > > > *5 - Missing video pods massively increased from 80 to 140 users*
> > > >
> > > > During that time when login is slow the amount of conference rooms
> > where
> > > > video pods are missing is massively increased.
> > > >
> > > > That could be because the OpenMeetings Server is generally slow.
> Could
> > be
> > > > because something is particularly slow within fetching the user
> entity
> > or
> > > > login command.
> > > >
> > > > *6 - Missing video always issue on sender side*
> > > >
> > > > From my tests I could see that - if a video is missing that is
> because
> > a
> > > > SENDER has:
> > > > A) Not started the video
> > > > B) Stuck in the video being published
> > > >
> > > > Like I say, the case of (a) it MASSIVELY increased once the server
> had
> > > 140
> > > > users. But I think it's valuable information to know that there seem
> to
> > > be
> > > > TWO issues. One related with video pods not starting up, the other
> > around
> > > > once the video pod started, the video stream doesn't somehow get
> > > triggered.
> > > >
> > > >
> > > > Generally - digging through above graphs and reports
> > > >
> > > > All above metrics and test runs you can dig through yourself. I will
> > > > obviously delete those dashboards shortly again. But you can have a
> dig
> > > > through it:
> > > >
> > > >    - Dashboard for v5.1.0 with additional indexes on Address.email
> > > >    <
> > > >
> > >
> >
> http://54.162.44.21:5080/graph?g0.expr=rate(webapp_metrics_filter_sum%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2F.%2B%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(webapp_metrics_filter_count%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2F.%2B%22%7D%5B1m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=15m&g0.end_input=2021-02-05%2000%3A30%3A00&g0.moment_input=2021-02-05%2000%3A30%3A00&g1.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22application%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22application%22%7D%5B1m%5D)&g1.tab=0&g1.stacked=0&g1.range_input=15m&g1.end_input=2021-02-05%2000%3A30%3A00&g1.moment_input=2021-02-05%2000%3A30%3A00&g2.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22database%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22database%22%7D%5B1m%5D)&g2.tab=0&g2.stacked=0&g2.range_input=15m&g2.end_input=2021-02-05%2000%3A30%3A00&g2.moment_input=2021-02-05%2000%3A30%3A00&g3.expr=tomcat_threads_active_total&g3.tab=0&g3.stacked=0&g3.range_input=15m&g3.end_input=2021-02-05%2000%3A30%3A00&g3.moment_input=2021-02-05%2000%3A30%3A00
> > > > >
> > > >    - Test with 140 user at 2021-02-05 02:52:00
> > > >    <
> > > >
> > >
> >
> http://54.162.44.21:5080/graph?g0.expr=rate(webapp_metrics_filter_sum%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2F.%2B%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(webapp_metrics_filter_count%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2F.%2B%22%7D%5B1m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=15m&g0.end_input=2021-02-05%2002%3A52%3A00&g0.moment_input=2021-02-05%2002%3A52%3A00&g1.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22application%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22application%22%7D%5B1m%5D)&g1.tab=0&g1.stacked=0&g1.range_input=15m&g1.end_input=2021-02-05%2002%3A52%3A00&g1.moment_input=2021-02-05%2002%3A52%3A00&g2.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22database%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22database%22%7D%5B1m%5D)&g2.tab=0&g2.stacked=0&g2.range_input=15m&g2.end_input=2021-02-05%2002%3A52%3A00&g2.moment_input=2021-02-05%2002%3A52%3A00&g3.expr=tomcat_threads_active_total&g3.tab=0&g3.stacked=0&g3.range_input=15m&g3.end_input=2021-02-05%2002%3A52%3A00&g3.moment_input=2021-02-05%2002%3A52%3A00
> > > > >
> > > >
> > > > If you go to those dashboards it will help probably a lot to ready if
> > you
> > > > have a look at:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/Prometheus+Logging+and+Metrics
> > > >
> > > > *What else ????*
> > > >
> > > >
> > > >    - What are we thinking on the above findings ? Is login method a
> > > >    performance issue or just a symptom of a generally slow server ?
> It
> > > > doesn't
> > > >    really look like a general issue. The difference between
> > UserDao:login
> > > > and
> > > >    EVERYTHING else seems just too drastic
> > > >    - What other methods and calls should I try to trace and measure ?
> > > >    - Is there any event within initialising a video pod that I could
> > > >    measure and metric ?
> > > >
> > > > Any other ideas ?
> > > >
> > > > Thanks
> > > > Sebastian
> > > >
> > > > Sebastian Wagner
> > > > Director Arrakeen Solutions, OM-Hosting.com
> > > > http://arrakeen-solutions.co.nz/
> > > > https://om-hosting.com - Cloud & Server Hosting for HTML5
> > > > Video-Conferencing OpenMeetings
> > > > <
> > > >
> > >
> >
> https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url
> > > > >
> > > > <
> > > >
> > >
> >
> https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Maxim
> > >
> >
>
>
> --
> Best regards,
> Maxim
>

Re: OpenMeetings 80 and 140 users performance test results

Reply via email to