Monitoring using what is the question?

APM products like New Relic don't capture the metrics needed to work out how to 
tune a WSGI server properly. If you have rolled your own way of capturing 
metrics and injecting it into something like DataDog, Prometheus or InfluxDB my 
guess is that still likely not capturing what you need.

I have posted a bit about the topic before at:

http://blog.dscpl.com.au/2015/06/implementing-request-monitoring-within.html 
<http://blog.dscpl.com.au/2015/06/implementing-request-monitoring-within.html>

but even that doesn't show examples of how to get one of the most important 
metrics for tuning Apache/mod_wsgi, which is thread pool and capacity 
utilisation. I didn't ever seem to follow up that post with the details of how 
to do it. :-)

Graham

> On 2 Apr 2019, at 7:14 pm, Stéphane Poss <[email protected]> wrote:
> 
> Thanks a lot for the detailed information, I now better understand how the 
> parameters are related to one another. I do have monitoring in place, as of 
> course it's a necessity. 
> 
> Cheers,
> S Poss
>  
> 
> Le mar. 2 avr. 2019 à 01:52, Graham Dumpleton <[email protected] 
> <mailto:[email protected]>> a écrit :
> Sorry for the slow reply. Been quite busy trying to finish off some stuff 
> before a holiday.
> 
>> On 28 Mar 2019, at 7:17 pm, Stéphane Poss <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi,
>> I am unable to find the correct doc about the problem I have and I hope 
>> you'll forgive my naivety. I'm running a django app using mod_wsgi with 
>> mpm_event. 
>> 
>> The env is: Server Version: Apache/2.4.25 (Debian) OpenSSL/1.0.2r 
>> mod_wsgi/4.6.4 Python/3.5
>> 
>> Here are the relevant bits of the Apache VirtualHost conf:
>> 
>>             WSGIScriptAlias / 
>> /opt/virtualenvs/server/lib/python3.5/site-packages/server/wsgi.py
>>             WSGIDaemonProcess server listen-backlog=200 processes=16 
>> threads=30 display-name=%{GROUP}  
>> python-path=/opt/virtualenvs/server/lib/python3.5/site-packages 
>> restart-interval=3600 graceful-timeout=10
> 
> Initial impressions are:
> 
> 1. Threads per daemon process is way too high. Unless you have a highly I/O 
> wait process, would recommend only 3-5 threads per process and rely on 
> multiple processes. Having a high number of threads can mean thread resources 
> go wasted, or requests could pile onto one process and overwhelm things 
> because of global interpreter lock issues.
> 
> 2. Increasing listener backlog to 200 on daemon process is probably not a 
> good idea. Even the default of about 100 is way too many usually. Having 
> backlog there is a bad idea because if your system gets overloaded, you have 
> a backlog of requests which you still end up handling, yet the user has 
> probably gone away if delay was significant. Better to reject requests with 
> an error using queue-timeout rather than put your application in a 
> permanently overloaded state where never seems to catch up.
> 
> 3. You should not use python-path to refer to a site-packages directory. Use 
> python-home to refer to the root of the virtual environment instead.
> 
>>             WSGIApplicationGroup %{GLOBAL}
>>             WSGIProcessGroup server
>> 
>> I have another Virtual host listening on port 443 with config:
>> 
>>             WSGIScriptAlias / 
>> /opt/virtualenvs/server/lib/python3.5/site-packages/tile_server/wsgi.py
>>             WSGIProcessGroup server
>> 
>> 
>> the following is the mpm_event config:
>> 
>>         ServerLimit        32
>>         StartServers                     3
>>         MinSpareThreads          75
>>         MaxSpareThreads          150
>>         ThreadLimit                      64
>>         ThreadsPerChild          40
>>         MaxRequestWorkers         500
>>         MaxConnectionsPerChild   0
> 
> These settings are out of whack. Some general rules about setting these to 
> avoid strange behaviour.
> 
> 1. MaxRequestWorkers should be an integer multiple of ThreadsPerChild.
> 
> 2. MaxRequestWorkers would normally be ThreadsPerChild * ServerLimit.
> 
> 3. MinSpareThreads should be a multiple of ThreadsPerChild.
> 
> 4. MaxSpareThreads should be a multiple of ThreadsPerChild.
> 
> For (3) and (4), if they aren't a multiple of ThreadsPerChild, you can invoke 
> strange behaviour that might cause Apache to keep starting/stopping child 
> processes.
> 
> A better config might be:
> 
>         ServerLimit        32
>         StartServers                     3
>         MinSpareThreads          75
>         MaxSpareThreads          150
>         ThreadLimit                      64
>         ThreadsPerChild          25
>         MaxRequestWorkers         800
>         MaxConnectionsPerChild   0
> 
>> I'm having a hard time finding the relationship between the 'processes' and 
>> 'threads' of the WSGIDaemonProcess and the StartServers, ThreadsPerChild and 
>> MaxRequestWorkers of the mpm_event config. I have checked some of the videos 
>> I found on other threads (very interesting!) but was not able to find the 
>> means to understand how to configure the 2 together.
> 
> The MPM settings above only related to the Apache child worker processes. 
> These handle static files requests. For the WSGI application, all they do is 
> act as a proxy for those requests.
> 
> So MaxRequestWorkers should at least be greater than processes*threads of 
> daemon process group otherwise the Apache child processes would never have 
> enough capacity to proxy requests up to the capacity of the daemon process 
> group. You would add a bit extra capacity in the MPM settings, over what the 
> daemon process group can handle, so it has capacity to handle static requests 
> and queued up WSGI application requests.
> 
> What you set the MPM settings and daemon process group settings to depends on 
> request throughput, and whether WSGI application is CPU and I/O bound.
> 
> You are never going to be able to tune these properly if you don't know have 
> a way of monitoring throughput, request times, and performance of the server.
> 
> Bumping up threads in daemon process groups because you think you need to 
> handle a huge number of concurrent requests, more often than not will just 
> make things worse and is unnecessary.
> 
>> My issue is that I seem not to have a high CPU usage on the host (it's a 
>> VM), using cached data, while not being able to serve more than 60-70 
>> requests per second. I'm wondering why kind of knob I should turn to improve 
>> the server's usage and thus the request rate.
>> Another issue I discovered this morning is the following:
>> 
>> [Thu Mar 28 08:50:53.568321 2019] [wsgi:error] [pid 21253:tid 
>> 140125284001536] (2)No such file or directory: [client 
>> 2a02:121f:21b:0:c6cd:4394:566a:12ea:33664] mod_wsgi (pid=21253): Unable to 
>> connect to WSGI daemon process 'tile-elevation' on 
>> '/var/run/apache2/wsgi.15888.0.1.sock' as user with uid=33.
>> 
>> Looks like the socket was rotated, but I cannot see why...
> 
> This is usually because the operating system logging system is force 
> restarting Apache once a day so it can rotate log files, instead of letting 
> Apache rotate the log files itself. You can end up seeing this error when 
> keep alive connections were being used by a client, and the keep alive 
> connection survived because of graceful restart being used, but daemon 
> process had been restarted.
> 
> You can avoid this problem by setting the directive:
> 
>     WSGISocketRotation Off
> 
>> Thanks in advance for the assitance, and thanks for the great tool!
> 
> Beyond that, it is hard to suggest what you should use without you having 
> instrumentation for your WSGI application and mod_wsgi so you can monitor 
> throughput, request times, capacity utilisation etc.
> 
> Do you have any monitoring in place? Have you eliminated that your bottleneck 
> isn't your Python application code or backend databases etc.
> 
> I would at least suggest using:
> 
>     processes=16 threads=5
> 
> and see what happens. This will eliminate potential issues with the Python 
> GIL and pilling up of requests in one process.
> 
> If however you are trying to test the setup by using a benchmarking tool at 
> maximum throughput, you are always going to get silly results. You should 
> never test a server setup in overloaded state as it tells you nothing about 
> how to tune it and more often than not just triggers pathological conditions 
> in the server setup. You want to aim to test with 40-60 capacity, and set 
> systems up so always running around that much for typical traffic volumes, 
> scaling horizontally when need more capacity.
> 
> Finally, will mention again the importance of monitoring. If you want to do 
> this properly, you need it.
> 
>> 
>> Cheers!
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] 
>> <mailto:[email protected]>.
>> To post to this group, send email to [email protected] 
>> <mailto:[email protected]>.
>> Visit this group at https://groups.google.com/group/modwsgi 
>> <https://groups.google.com/group/modwsgi>.
>> For more options, visit https://groups.google.com/d/optout 
>> <https://groups.google.com/d/optout>.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> To post to this group, send email to [email protected] 
> <mailto:[email protected]>.
> Visit this group at https://groups.google.com/group/modwsgi 
> <https://groups.google.com/group/modwsgi>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> To post to this group, send email to [email protected] 
> <mailto:[email protected]>.
> Visit this group at https://groups.google.com/group/modwsgi 
> <https://groups.google.com/group/modwsgi>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to