The index, I already added, it was after the index that it started taking
10+ seconds (cause I increased load from 80 to 140 users)

However you could see some slight improvement (and actually on overall CPU
usage of the OpenMeetings Server) once adding the index, even between
testing with 80 users on the login method.

However it didn't help that much.

Thanks
Seb

Sebastian Wagner
Director Arrakeen Solutions, OM-Hosting.com
http://arrakeen-solutions.co.nz/
https://om-hosting.com - Cloud & Server Hosting for HTML5
Video-Conferencing OpenMeetings
<https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url>
<https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url>


On Fri, 5 Feb 2021 at 18:42, [email protected] <[email protected]>
wrote:

> It is login via API: /services/users/login/
>
> But: I've put the metric also on the UserDao::login and the timings look
> identical.
>
> So it would matter how you login I think.
>
> It just seems so strange that this method takes so long while RoomDao and
> others don't behave that way.
>
> Thanks
> Seb
>
> Sebastian Wagner
> Director Arrakeen Solutions, OM-Hosting.com
> http://arrakeen-solutions.co.nz/
> https://om-hosting.com - Cloud & Server Hosting for HTML5
> Video-Conferencing OpenMeetings
>
> <https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url>
> <https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url>
>
>
> On Fri, 5 Feb 2021 at 18:38, Maxim Solodovnik <[email protected]>
> wrote:
>
>> sorry for top posting
>>
>> will re-read more carefully later
>>
>> What I would like to do as first step:
>> 1) add your index to DB
>> 2) analyze your your login results and write Unit test for login
>> was it login by hash or login by username/password?
>>
>>
>> On Fri, 5 Feb 2021 at 12:28, [email protected] <[email protected]
>> >
>> wrote:
>>
>> > Hi,
>> >
>> > as you noticed I added a branch with performance metrics. Add did rerun
>> > tests with similar and large user numbers.
>> >
>> > Results
>> >
>> >    - *80 users test*
>> >    test:
>> >
>> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test
>> >    <
>> >
>> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test
>> > >
>> >    - *140 users test*
>> >
>> >
>> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+140+users+test
>> >
>> > Findings
>> >
>> > *1 - Added index in address.email*
>> >
>> > I actually did the *80 users test* twice. I found that I could improve
>> the
>> > performance of the login command by adding some index into address.email
>> > See:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test#OpenMeetings80userstest-CPUandmemoryusage
>> > <
>> >
>> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test#OpenMeetings80userstest-CPUandmemoryusage
>> > >
>> >
>> > *2 - Login command performance increasingly bad with number of users*
>> >
>> > While I could improve some of the UserService::login (or generally
>> > UserDao::login) with the index, *then I switched from 80 users to 140
>> > users*
>> > and rerun the tests: *The login command started to take 10+ seconds*
>> >
>> >    - I also used an actual log command to verify my metrics because it
>> just
>> >    seemed so strange !
>> >    - I also checked other methods, they are NOT affected. Or at least
>> by a
>> >    very far margin increase very slightly. It's NOT generally slow, but
>> > some
>> >    part of the application is slow!
>> >    - You can see how the duration increases in length here during the
>> test
>> >    run:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+140+users+test#OpenMeetings140userstest-Loginwebservicecallhits10seconds
>> >    - And you can see here that its not just the HTTP call that takes
>> long
>> >    but the actual UserDao:login command:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+140+users+test#OpenMeetings140userstest-DatabaseUserDaologinmethodisbyfarthelongestrunningoneandhitsplus10seconds
>> >
>> > => Now this is kind of puzzling. Other methods don't perform that
>> > increasingly bad, its just this login command. Like RoomDao::method do
>> not
>> > perform badly. Or not at that scale (by far not).
>> >
>> > *Questions:*
>> >
>> >    - Is it possible that the login method that fetches the entire user
>> has
>> >    too many entities linked? SO that even a simple fetch or login
>> starts to
>> >    take a long time quickly ?
>> >    - Is it really possible that this "only this method" takes so long
>> or am
>> >    I measuring a general issue because the CPU spike is at 95% ? Like I
>> > say:
>> >    RoomDao seems fine, are there other methods I should measure ?
>> >
>> > *3 - Tomcat threads unused or way below max*
>> >
>> > As you can see between those two graphs:
>> >
>> >    - 80 users test active threads:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test#OpenMeetings80userstest-Tomcatactivethreads
>> >    - 140 users test active threads:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+140+users+test#OpenMeetings140userstest-Tomcatactivethreads
>> >
>> > => 140 users utilize more threads. YES. But its WAY below the 400
>> > available. Not even closely to that number. I don't think its a
>> threading
>> > issue. It also makes sense. Cause given the test scenario you probably
>> have
>> > around 20-40 users hitting the server trying to login and enter the
>> room at
>> > the same time.
>> >
>> > *4 - RoomPanel:onInitialise and RoomPanel:enterRoom*
>> >
>> > I put some metrics on those methods, you can see the results of the 140
>> > users test run here:
>> >
>> >
>> http://54.162.44.21:5080/graph?g0.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22application%22%2Cclass%3D%22RoomPanel%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22application%22%2Cclass%3D%22RoomPanel%22%7D%5B1m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=15m&g0.end_input=2021-02-05%2002%3A52%3A00&g0.moment_input=2021-02-05%2002%3A52%3A00
>> >  => It is definitely getting worse BUT this is MILLISECONDS. It
>> increased
>> > 0.05seconds to 0.3seconds. That's getting slower but not that bad.
>> >
>> > If you compare the same graph with the login command:
>> >
>> >
>> http://54.162.44.21:5080/graph?g0.expr=rate(webapp_metrics_filter_sum%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2Fuser%2F.%2B%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(webapp_metrics_filter_count%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2Fuser%2F.%2B%22%7D%5B1m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=15m&g0.end_input=2021-02-05%2002%3A52%3A00&g0.moment_input=2021-02-05%2002%3A52%3A00
>> >  => That is going from ~800ms to 12seconds!
>> >
>> > It shows some similarity in the curve. But the durations are very
>> > different!
>> >
>> > *5 - Missing video pods massively increased from 80 to 140 users*
>> >
>> > During that time when login is slow the amount of conference rooms where
>> > video pods are missing is massively increased.
>> >
>> > That could be because the OpenMeetings Server is generally slow. Could
>> be
>> > because something is particularly slow within fetching the user entity
>> or
>> > login command.
>> >
>> > *6 - Missing video always issue on sender side*
>> >
>> > From my tests I could see that - if a video is missing that is because a
>> > SENDER has:
>> > A) Not started the video
>> > B) Stuck in the video being published
>> >
>> > Like I say, the case of (a) it MASSIVELY increased once the server had
>> 140
>> > users. But I think it's valuable information to know that there seem to
>> be
>> > TWO issues. One related with video pods not starting up, the other
>> around
>> > once the video pod started, the video stream doesn't somehow get
>> triggered.
>> >
>> >
>> > Generally - digging through above graphs and reports
>> >
>> > All above metrics and test runs you can dig through yourself. I will
>> > obviously delete those dashboards shortly again. But you can have a dig
>> > through it:
>> >
>> >    - Dashboard for v5.1.0 with additional indexes on Address.email
>> >    <
>> >
>> http://54.162.44.21:5080/graph?g0.expr=rate(webapp_metrics_filter_sum%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2F.%2B%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(webapp_metrics_filter_count%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2F.%2B%22%7D%5B1m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=15m&g0.end_input=2021-02-05%2000%3A30%3A00&g0.moment_input=2021-02-05%2000%3A30%3A00&g1.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22application%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22application%22%7D%5B1m%5D)&g1.tab=0&g1.stacked=0&g1.range_input=15m&g1.end_input=2021-02-05%2000%3A30%3A00&g1.moment_input=2021-02-05%2000%3A30%3A00&g2.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22database%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22database%22%7D%5B1m%5D)&g2.tab=0&g2.stacked=0&g2.range_input=15m&g2.end_input=2021-02-05%2000%3A30%3A00&g2.moment_input=2021-02-05%2000%3A30%3A00&g3.expr=tomcat_threads_active_total&g3.tab=0&g3.stacked=0&g3.range_input=15m&g3.end_input=2021-02-05%2000%3A30%3A00&g3.moment_input=2021-02-05%2000%3A30%3A00
>> > >
>> >    - Test with 140 user at 2021-02-05 02:52:00
>> >    <
>> >
>> http://54.162.44.21:5080/graph?g0.expr=rate(webapp_metrics_filter_sum%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2F.%2B%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(webapp_metrics_filter_count%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2F.%2B%22%7D%5B1m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=15m&g0.end_input=2021-02-05%2002%3A52%3A00&g0.moment_input=2021-02-05%2002%3A52%3A00&g1.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22application%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22application%22%7D%5B1m%5D)&g1.tab=0&g1.stacked=0&g1.range_input=15m&g1.end_input=2021-02-05%2002%3A52%3A00&g1.moment_input=2021-02-05%2002%3A52%3A00&g2.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22database%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22database%22%7D%5B1m%5D)&g2.tab=0&g2.stacked=0&g2.range_input=15m&g2.end_input=2021-02-05%2002%3A52%3A00&g2.moment_input=2021-02-05%2002%3A52%3A00&g3.expr=tomcat_threads_active_total&g3.tab=0&g3.stacked=0&g3.range_input=15m&g3.end_input=2021-02-05%2002%3A52%3A00&g3.moment_input=2021-02-05%2002%3A52%3A00
>> > >
>> >
>> > If you go to those dashboards it will help probably a lot to ready if
>> you
>> > have a look at:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/OPENMEETINGS/Prometheus+Logging+and+Metrics
>> >
>> > *What else ????*
>> >
>> >
>> >    - What are we thinking on the above findings ? Is login method a
>> >    performance issue or just a symptom of a generally slow server ? It
>> > doesn't
>> >    really look like a general issue. The difference between
>> UserDao:login
>> > and
>> >    EVERYTHING else seems just too drastic
>> >    - What other methods and calls should I try to trace and measure ?
>> >    - Is there any event within initialising a video pod that I could
>> >    measure and metric ?
>> >
>> > Any other ideas ?
>> >
>> > Thanks
>> > Sebastian
>> >
>> > Sebastian Wagner
>> > Director Arrakeen Solutions, OM-Hosting.com
>> > http://arrakeen-solutions.co.nz/
>> > https://om-hosting.com - Cloud & Server Hosting for HTML5
>> > Video-Conferencing OpenMeetings
>> > <
>> >
>> https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url
>> > >
>> > <
>> >
>> https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url
>> > >
>> >
>>
>>
>> --
>> Best regards,
>> Maxim
>>
>

Reply via email to