sorry for top posting

will re-read more carefully later

What I would like to do as first step:
1) add your index to DB
2) analyze your your login results and write Unit test for login
was it login by hash or login by username/password?


On Fri, 5 Feb 2021 at 12:28, [email protected] <[email protected]>
wrote:

> Hi,
>
> as you noticed I added a branch with performance metrics. Add did rerun
> tests with similar and large user numbers.
>
> Results
>
>    - *80 users test*
>    test:
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test
>    <
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test
> >
>    - *140 users test*
>
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+140+users+test
>
> Findings
>
> *1 - Added index in address.email*
>
> I actually did the *80 users test* twice. I found that I could improve the
> performance of the login command by adding some index into address.email
> See:
>
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test#OpenMeetings80userstest-CPUandmemoryusage
> <
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test#OpenMeetings80userstest-CPUandmemoryusage
> >
>
> *2 - Login command performance increasingly bad with number of users*
>
> While I could improve some of the UserService::login (or generally
> UserDao::login) with the index, *then I switched from 80 users to 140
> users*
> and rerun the tests: *The login command started to take 10+ seconds*
>
>    - I also used an actual log command to verify my metrics because it just
>    seemed so strange !
>    - I also checked other methods, they are NOT affected. Or at least by a
>    very far margin increase very slightly. It's NOT generally slow, but
> some
>    part of the application is slow!
>    - You can see how the duration increases in length here during the test
>    run:
>
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+140+users+test#OpenMeetings140userstest-Loginwebservicecallhits10seconds
>    - And you can see here that its not just the HTTP call that takes long
>    but the actual UserDao:login command:
>
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+140+users+test#OpenMeetings140userstest-DatabaseUserDaologinmethodisbyfarthelongestrunningoneandhitsplus10seconds
>
> => Now this is kind of puzzling. Other methods don't perform that
> increasingly bad, its just this login command. Like RoomDao::method do not
> perform badly. Or not at that scale (by far not).
>
> *Questions:*
>
>    - Is it possible that the login method that fetches the entire user has
>    too many entities linked? SO that even a simple fetch or login starts to
>    take a long time quickly ?
>    - Is it really possible that this "only this method" takes so long or am
>    I measuring a general issue because the CPU spike is at 95% ? Like I
> say:
>    RoomDao seems fine, are there other methods I should measure ?
>
> *3 - Tomcat threads unused or way below max*
>
> As you can see between those two graphs:
>
>    - 80 users test active threads:
>
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test#OpenMeetings80userstest-Tomcatactivethreads
>    - 140 users test active threads:
>
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+140+users+test#OpenMeetings140userstest-Tomcatactivethreads
>
> => 140 users utilize more threads. YES. But its WAY below the 400
> available. Not even closely to that number. I don't think its a threading
> issue. It also makes sense. Cause given the test scenario you probably have
> around 20-40 users hitting the server trying to login and enter the room at
> the same time.
>
> *4 - RoomPanel:onInitialise and RoomPanel:enterRoom*
>
> I put some metrics on those methods, you can see the results of the 140
> users test run here:
>
> http://54.162.44.21:5080/graph?g0.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22application%22%2Cclass%3D%22RoomPanel%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22application%22%2Cclass%3D%22RoomPanel%22%7D%5B1m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=15m&g0.end_input=2021-02-05%2002%3A52%3A00&g0.moment_input=2021-02-05%2002%3A52%3A00
>  => It is definitely getting worse BUT this is MILLISECONDS. It increased
> 0.05seconds to 0.3seconds. That's getting slower but not that bad.
>
> If you compare the same graph with the login command:
>
> http://54.162.44.21:5080/graph?g0.expr=rate(webapp_metrics_filter_sum%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2Fuser%2F.%2B%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(webapp_metrics_filter_count%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2Fuser%2F.%2B%22%7D%5B1m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=15m&g0.end_input=2021-02-05%2002%3A52%3A00&g0.moment_input=2021-02-05%2002%3A52%3A00
>  => That is going from ~800ms to 12seconds!
>
> It shows some similarity in the curve. But the durations are very
> different!
>
> *5 - Missing video pods massively increased from 80 to 140 users*
>
> During that time when login is slow the amount of conference rooms where
> video pods are missing is massively increased.
>
> That could be because the OpenMeetings Server is generally slow. Could be
> because something is particularly slow within fetching the user entity or
> login command.
>
> *6 - Missing video always issue on sender side*
>
> From my tests I could see that - if a video is missing that is because a
> SENDER has:
> A) Not started the video
> B) Stuck in the video being published
>
> Like I say, the case of (a) it MASSIVELY increased once the server had 140
> users. But I think it's valuable information to know that there seem to be
> TWO issues. One related with video pods not starting up, the other around
> once the video pod started, the video stream doesn't somehow get triggered.
>
>
> Generally - digging through above graphs and reports
>
> All above metrics and test runs you can dig through yourself. I will
> obviously delete those dashboards shortly again. But you can have a dig
> through it:
>
>    - Dashboard for v5.1.0 with additional indexes on Address.email
>    <
> http://54.162.44.21:5080/graph?g0.expr=rate(webapp_metrics_filter_sum%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2F.%2B%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(webapp_metrics_filter_count%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2F.%2B%22%7D%5B1m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=15m&g0.end_input=2021-02-05%2000%3A30%3A00&g0.moment_input=2021-02-05%2000%3A30%3A00&g1.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22application%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22application%22%7D%5B1m%5D)&g1.tab=0&g1.stacked=0&g1.range_input=15m&g1.end_input=2021-02-05%2000%3A30%3A00&g1.moment_input=2021-02-05%2000%3A30%3A00&g2.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22database%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22database%22%7D%5B1m%5D)&g2.tab=0&g2.stacked=0&g2.range_input=15m&g2.end_input=2021-02-05%2000%3A30%3A00&g2.moment_input=2021-02-05%2000%3A30%3A00&g3.expr=tomcat_threads_active_total&g3.tab=0&g3.stacked=0&g3.range_input=15m&g3.end_input=2021-02-05%2000%3A30%3A00&g3.moment_input=2021-02-05%2000%3A30%3A00
> >
>    - Test with 140 user at 2021-02-05 02:52:00
>    <
> http://54.162.44.21:5080/graph?g0.expr=rate(webapp_metrics_filter_sum%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2F.%2B%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(webapp_metrics_filter_count%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2F.%2B%22%7D%5B1m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=15m&g0.end_input=2021-02-05%2002%3A52%3A00&g0.moment_input=2021-02-05%2002%3A52%3A00&g1.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22application%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22application%22%7D%5B1m%5D)&g1.tab=0&g1.stacked=0&g1.range_input=15m&g1.end_input=2021-02-05%2002%3A52%3A00&g1.moment_input=2021-02-05%2002%3A52%3A00&g2.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22database%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22database%22%7D%5B1m%5D)&g2.tab=0&g2.stacked=0&g2.range_input=15m&g2.end_input=2021-02-05%2002%3A52%3A00&g2.moment_input=2021-02-05%2002%3A52%3A00&g3.expr=tomcat_threads_active_total&g3.tab=0&g3.stacked=0&g3.range_input=15m&g3.end_input=2021-02-05%2002%3A52%3A00&g3.moment_input=2021-02-05%2002%3A52%3A00
> >
>
> If you go to those dashboards it will help probably a lot to ready if you
> have a look at:
>
> https://cwiki.apache.org/confluence/display/OPENMEETINGS/Prometheus+Logging+and+Metrics
>
> *What else ????*
>
>
>    - What are we thinking on the above findings ? Is login method a
>    performance issue or just a symptom of a generally slow server ? It
> doesn't
>    really look like a general issue. The difference between UserDao:login
> and
>    EVERYTHING else seems just too drastic
>    - What other methods and calls should I try to trace and measure ?
>    - Is there any event within initialising a video pod that I could
>    measure and metric ?
>
> Any other ideas ?
>
> Thanks
> Sebastian
>
> Sebastian Wagner
> Director Arrakeen Solutions, OM-Hosting.com
> http://arrakeen-solutions.co.nz/
> https://om-hosting.com - Cloud & Server Hosting for HTML5
> Video-Conferencing OpenMeetings
> <
> https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url
> >
> <
> https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url
> >
>


-- 
Best regards,
Maxim

Reply via email to