my password was 100% correct. Cause it a test script I run. There wasn't a single error during login.
And the method got invoked exactly 80 times (or 140 times in the bigger test) (you can check my dashboards:)) You can see the number of calls and the rate in this graph (for the 140 users test): http://54.162.44.21:5080/graph?g0.expr=sum(increase(webapp_metrics_filter_count%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2Fuser%2F.%2B%22%7D%5B1m%5D))&g0.tab=0&g0.stacked=0&g0.range_input=15m&g0.end_input=2021-02-05%2002%3A52%3A00&g0.moment_input=2021-02-05%2002%3A52%3A00 Will invoking this method 140 times in a 5min period make the CPU spike to 95% and lead to 10seconds call length ? :) Also like I say, I have the same metric on the UserDao::login => And it looks the same. I don't think so. Thanks Seb Sebastian Wagner Director Arrakeen Solutions, OM-Hosting.com http://arrakeen-solutions.co.nz/ https://om-hosting.com - Cloud & Server Hosting for HTML5 Video-Conferencing OpenMeetings <https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url> <https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url> On Fri, 5 Feb 2021 at 18:44, Maxim Solodovnik <[email protected]> wrote: > Can it be password hash function? > It should be slow (for security reasons) > And It should be even slower if wrong password was provided > > On Fri, 5 Feb 2021 at 12:42, [email protected] <[email protected]> > wrote: > > > It is login via API: /services/users/login/ > > > > But: I've put the metric also on the UserDao::login and the timings look > > identical. > > > > So it would matter how you login I think. > > > > It just seems so strange that this method takes so long while RoomDao and > > others don't behave that way. > > > > Thanks > > Seb > > > > Sebastian Wagner > > Director Arrakeen Solutions, OM-Hosting.com > > http://arrakeen-solutions.co.nz/ > > https://om-hosting.com - Cloud & Server Hosting for HTML5 > > Video-Conferencing OpenMeetings > > < > > > https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url > > > > > < > > > https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url > > > > > > > > > On Fri, 5 Feb 2021 at 18:38, Maxim Solodovnik <[email protected]> > > wrote: > > > > > sorry for top posting > > > > > > will re-read more carefully later > > > > > > What I would like to do as first step: > > > 1) add your index to DB > > > 2) analyze your your login results and write Unit test for login > > > was it login by hash or login by username/password? > > > > > > > > > On Fri, 5 Feb 2021 at 12:28, [email protected] < > > [email protected]> > > > wrote: > > > > > > > Hi, > > > > > > > > as you noticed I added a branch with performance metrics. Add did > rerun > > > > tests with similar and large user numbers. > > > > > > > > Results > > > > > > > > - *80 users test* > > > > test: > > > > > > > > > > https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test > > > > < > > > > > > > > > > https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test > > > > > > > > > - *140 users test* > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+140+users+test > > > > > > > > Findings > > > > > > > > *1 - Added index in address.email* > > > > > > > > I actually did the *80 users test* twice. I found that I could > improve > > > the > > > > performance of the login command by adding some index into > > address.email > > > > See: > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test#OpenMeetings80userstest-CPUandmemoryusage > > > > < > > > > > > > > > > https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test#OpenMeetings80userstest-CPUandmemoryusage > > > > > > > > > > > > > *2 - Login command performance increasingly bad with number of users* > > > > > > > > While I could improve some of the UserService::login (or generally > > > > UserDao::login) with the index, *then I switched from 80 users to 140 > > > > users* > > > > and rerun the tests: *The login command started to take 10+ seconds* > > > > > > > > - I also used an actual log command to verify my metrics because > it > > > just > > > > seemed so strange ! > > > > - I also checked other methods, they are NOT affected. Or at least > > by > > > a > > > > very far margin increase very slightly. It's NOT generally slow, > but > > > > some > > > > part of the application is slow! > > > > - You can see how the duration increases in length here during the > > > test > > > > run: > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+140+users+test#OpenMeetings140userstest-Loginwebservicecallhits10seconds > > > > - And you can see here that its not just the HTTP call that takes > > long > > > > but the actual UserDao:login command: > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+140+users+test#OpenMeetings140userstest-DatabaseUserDaologinmethodisbyfarthelongestrunningoneandhitsplus10seconds > > > > > > > > => Now this is kind of puzzling. Other methods don't perform that > > > > increasingly bad, its just this login command. Like RoomDao::method > do > > > not > > > > perform badly. Or not at that scale (by far not). > > > > > > > > *Questions:* > > > > > > > > - Is it possible that the login method that fetches the entire > user > > > has > > > > too many entities linked? SO that even a simple fetch or login > > starts > > > to > > > > take a long time quickly ? > > > > - Is it really possible that this "only this method" takes so long > > or > > > am > > > > I measuring a general issue because the CPU spike is at 95% ? > Like I > > > > say: > > > > RoomDao seems fine, are there other methods I should measure ? > > > > > > > > *3 - Tomcat threads unused or way below max* > > > > > > > > As you can see between those two graphs: > > > > > > > > - 80 users test active threads: > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+80+users+test#OpenMeetings80userstest-Tomcatactivethreads > > > > - 140 users test active threads: > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/OPENMEETINGS/OpenMeetings+140+users+test#OpenMeetings140userstest-Tomcatactivethreads > > > > > > > > => 140 users utilize more threads. YES. But its WAY below the 400 > > > > available. Not even closely to that number. I don't think its a > > threading > > > > issue. It also makes sense. Cause given the test scenario you > probably > > > have > > > > around 20-40 users hitting the server trying to login and enter the > > room > > > at > > > > the same time. > > > > > > > > *4 - RoomPanel:onInitialise and RoomPanel:enterRoom* > > > > > > > > I put some metrics on those methods, you can see the results of the > 140 > > > > users test run here: > > > > > > > > > > > > > > http://54.162.44.21:5080/graph?g0.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22application%22%2Cclass%3D%22RoomPanel%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22application%22%2Cclass%3D%22RoomPanel%22%7D%5B1m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=15m&g0.end_input=2021-02-05%2002%3A52%3A00&g0.moment_input=2021-02-05%2002%3A52%3A00 > > > > => It is definitely getting worse BUT this is MILLISECONDS. It > > increased > > > > 0.05seconds to 0.3seconds. That's getting slower but not that bad. > > > > > > > > If you compare the same graph with the login command: > > > > > > > > > > > > > > http://54.162.44.21:5080/graph?g0.expr=rate(webapp_metrics_filter_sum%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2Fuser%2F.%2B%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(webapp_metrics_filter_count%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2Fuser%2F.%2B%22%7D%5B1m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=15m&g0.end_input=2021-02-05%2002%3A52%3A00&g0.moment_input=2021-02-05%2002%3A52%3A00 > > > > => That is going from ~800ms to 12seconds! > > > > > > > > It shows some similarity in the curve. But the durations are very > > > > different! > > > > > > > > *5 - Missing video pods massively increased from 80 to 140 users* > > > > > > > > During that time when login is slow the amount of conference rooms > > where > > > > video pods are missing is massively increased. > > > > > > > > That could be because the OpenMeetings Server is generally slow. > Could > > be > > > > because something is particularly slow within fetching the user > entity > > or > > > > login command. > > > > > > > > *6 - Missing video always issue on sender side* > > > > > > > > From my tests I could see that - if a video is missing that is > because > > a > > > > SENDER has: > > > > A) Not started the video > > > > B) Stuck in the video being published > > > > > > > > Like I say, the case of (a) it MASSIVELY increased once the server > had > > > 140 > > > > users. But I think it's valuable information to know that there seem > to > > > be > > > > TWO issues. One related with video pods not starting up, the other > > around > > > > once the video pod started, the video stream doesn't somehow get > > > triggered. > > > > > > > > > > > > Generally - digging through above graphs and reports > > > > > > > > All above metrics and test runs you can dig through yourself. I will > > > > obviously delete those dashboards shortly again. But you can have a > dig > > > > through it: > > > > > > > > - Dashboard for v5.1.0 with additional indexes on Address.email > > > > < > > > > > > > > > > http://54.162.44.21:5080/graph?g0.expr=rate(webapp_metrics_filter_sum%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2F.%2B%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(webapp_metrics_filter_count%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2F.%2B%22%7D%5B1m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=15m&g0.end_input=2021-02-05%2000%3A30%3A00&g0.moment_input=2021-02-05%2000%3A30%3A00&g1.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22application%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22application%22%7D%5B1m%5D)&g1.tab=0&g1.stacked=0&g1.range_input=15m&g1.end_input=2021-02-05%2000%3A30%3A00&g1.moment_input=2021-02-05%2000%3A30%3A00&g2.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22database%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22database%22%7D%5B1m%5D)&g2.tab=0&g2.stacked=0&g2.range_input=15m&g2.end_input=2021-02-05%2000%3A30%3A00&g2.moment_input=2021-02-05%2000%3A30%3A00&g3.expr=tomcat_threads_active_total&g3.tab=0&g3.stacked=0&g3.range_input=15m&g3.end_input=2021-02-05%2000%3A30%3A00&g3.moment_input=2021-02-05%2000%3A30%3A00 > > > > > > > > > - Test with 140 user at 2021-02-05 02:52:00 > > > > < > > > > > > > > > > http://54.162.44.21:5080/graph?g0.expr=rate(webapp_metrics_filter_sum%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2F.%2B%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(webapp_metrics_filter_count%7Bpath%3D~%22%2Fopenmeetings%2Fservices%2F.%2B%22%7D%5B1m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=15m&g0.end_input=2021-02-05%2002%3A52%3A00&g0.moment_input=2021-02-05%2002%3A52%3A00&g1.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22application%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22application%22%7D%5B1m%5D)&g1.tab=0&g1.stacked=0&g1.range_input=15m&g1.end_input=2021-02-05%2002%3A52%3A00&g1.moment_input=2021-02-05%2002%3A52%3A00&g2.expr=rate(org_openmeetings_metrics_sum%7Btype%3D%22database%22%7D%5B1m%5D)%0A%2F%0A%20%20rate(org_openmeetings_metrics_count%7Btype%3D%22database%22%7D%5B1m%5D)&g2.tab=0&g2.stacked=0&g2.range_input=15m&g2.end_input=2021-02-05%2002%3A52%3A00&g2.moment_input=2021-02-05%2002%3A52%3A00&g3.expr=tomcat_threads_active_total&g3.tab=0&g3.stacked=0&g3.range_input=15m&g3.end_input=2021-02-05%2002%3A52%3A00&g3.moment_input=2021-02-05%2002%3A52%3A00 > > > > > > > > > > > > > If you go to those dashboards it will help probably a lot to ready if > > you > > > > have a look at: > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/OPENMEETINGS/Prometheus+Logging+and+Metrics > > > > > > > > *What else ????* > > > > > > > > > > > > - What are we thinking on the above findings ? Is login method a > > > > performance issue or just a symptom of a generally slow server ? > It > > > > doesn't > > > > really look like a general issue. The difference between > > UserDao:login > > > > and > > > > EVERYTHING else seems just too drastic > > > > - What other methods and calls should I try to trace and measure ? > > > > - Is there any event within initialising a video pod that I could > > > > measure and metric ? > > > > > > > > Any other ideas ? > > > > > > > > Thanks > > > > Sebastian > > > > > > > > Sebastian Wagner > > > > Director Arrakeen Solutions, OM-Hosting.com > > > > http://arrakeen-solutions.co.nz/ > > > > https://om-hosting.com - Cloud & Server Hosting for HTML5 > > > > Video-Conferencing OpenMeetings > > > > < > > > > > > > > > > https://www.youracclaim.com/badges/da4e8828-743d-4968-af6f-49033f10d60a/public_url > > > > > > > > > < > > > > > > > > > > https://www.youracclaim.com/badges/b7e709c6-aa87-4b02-9faf-099038475e36/public_url > > > > > > > > > > > > > > > > > > -- > > > Best regards, > > > Maxim > > > > > > > > -- > Best regards, > Maxim >
