Re: [modwsgi] High capacity server with errors in logs.

Graham Dumpleton Thu, 27 Feb 2014 14:31:52 -0800

Which is not surprising.

You are still allowing only 80 concurrent requests in the daemon process group 
yet the capacity within the Apache workers which are proxying to the daemon 
processes running the WSGI application is a lot more.


The result is that you are likely creating a big backlog of requests within the 
Apache workers for requests to be proxied to the daemon processes and the 
daemon processes aren't keeping up. At that point the worker processes will 
start getting failures to connect to the daemon process because the socket 
listener backlog for the daemon processes, which is currently set at 100, has 
been exceeded. The workers will keep attempting to reconnect, but will give up 
after about 20 seconds and give the error you see.

In short, you are overloading the capacity of the daemon processes to handle 
requests.

I describe this request funnel in my PyCon talk at:

http://lanyrd.com/2012/pycon/spcdg/

Using inactivity-timeout will not help at all the in this case and not even 
sure what you are hoping it would have done. It will not make a difference 
because the problem is that the daemon processes are too busy and not that they 
are doing nothing.

If you are willing to give New Relic a go under its trial period you could 
possibly get some insight into how much you are overloading the daemon 
processes. I talk about using it for monitoring capacity utilisation in:

http://blog.newrelic.com/2012/09/11/introducing-capacity-analysis-for-python/

So long as you are using mod_wsgi 3.4, you would also be able to get an 
indication through New Relic of how much backlog is being created by looking at 
the request queuing time.

https://docs.newrelic.com/docs/features/request-queuing-and-tracking-front-end-time

BTW, unless you have a really good reason not to, I would suggest turning 
KeepAlive to Off. It can actually make things worse as far as forcing you to 
specify much higher capacity in Apache workers. If KeepAlive is desirable for 
your specific application, then stick nginx in front of Apache and set 
KeepAlive to off in Apache but have it on in nginx. This way you shift the 
issue of handling keep alive connections to nginx which due to its async nature 
will handle it much better and with less resources. You should then be able to 
drop quite dramatically the capacity you have in the Apache workers.

Graham

On 28/02/2014, at 7:16 AM, Mario Adrian Lopez Aleman <[email protected]> wrote:

> Hi Graham.
> 
> I'm still having problems with these messages in the logs:
> [Wed Feb 19 03:31:04 2014] [error] [client 192.168.6.17] (2)No such file or 
> directory: mod_wsgi (pid=11996): Unable to connect to WSGI daemon process 
> 'ztrustee' on '/var/run/apache2/wsgi.1342.0.1.sock' after multiple attempts.
> [Wed Feb 19 03:31:04 2014] [error] [client 192.168.6.16] Premature end of 
> script headers: index.wsgi
> [Wed Feb 19 18:01:02 2014] [error] [client 192.168.6.8] (4)Interrupted system 
> call: mod_wsgi (pid=29224): Unable to connect to WSGI daemon process 
> 'ztrustee' on '/var/run/apache2/wsgi.1331.0.1.sock' after multiple attempts.
> 
> My configuration is now this one:
> 
> Timeout 3800
> KeepAlive On
> MaxKeepAliveRequests 0
> KeepAliveTimeout 10
> 
> <IfModule mpm_worker_module>
>     ThreadLimit          5000
>     ServerLimit          16
>     StartServers         16
>     MaxClients           1600
>     MinSpareThreads      1500
>     MaxSpareThreads      1600
>     ThreadsPerChild      100
>     MaxRequestsPerChild   0
> </IfModule>
> 
> WSGIDaemonProcess application display-name=%{GROUP} processes=4 threads=20
> 
> 
> The first batch of clients (now 1500) executes without problem and everyone 
> completes all the tasks.
> But in the second run, I start to see these messages in the logs again.
> I tried to use the option inactivity-timeout=120 but apparently doesn't do 
> any effect in the server (except it restarts the daemon after 2 minutes 
> without use).
> 
> I have been modifying the configuration, but I don't see what could be the 
> error on this one.
> 
> Please advice.
> 
> Regards!
> 
> 
> El miércoles, 19 de febrero de 2014 21:17:21 UTC-6, Graham Dumpleton escribió:
> One immediate concern is that your daemon process group will only be able to 
> handle 15 concurrent requests (the default), yet you have configured the main 
> Apache worker processes, which funnel requests into the daemon process group 
> to be able to handle up to 4000 concurrent requests.
> 
> That is one rather big bottleneck.
> 
> I would very much suggest trying different settings for processes/threads for 
> WSGIDaemonProcess directive. Right now it is defaulting to a single process 
> with 15 threads.
> 
> A second problem is that you are duplicating your application with two 
> separate sets of instances for HTTP and HTTPS traffic. This can be wasteful 
> on memory, although not entirely sure whether that is intentional or not 
> since the WSGIScriptAlias for 443 and 80 actually refer to different WSGI 
> script files.
> 
> If you have 80/443 pair, you would normally do:
> 
> <VirtualHost *:443>
> ServerName xxx
> WSGIDaemonProcess application
> WSGIProcessGroup application
> WSGIApplicationGroup %{GLOBAL}
> WSGIScriptAlias / /some/path/wsgi.py
> </VirtualHost>
> 
> <VirtualHost *:80>
> ServerName xxx
> WSGIProcessGroup application
> WSGIApplicationGroup %{GLOBAL}
> WSGIScriptAlias / /some/path/wsgi.py
> </VirtualHost>
> 
> That is, you would only define WSGIDaemonProcess in the first VirtiualHost 
> for that ServerName seen by Apache when reading the configuration file and 
> then referring to that using WSGIProcessGroup in the second by name rather 
> than having a second WSGIDaemonProcess directive.
> 
> You would only do this though if they are the same code base, especially if 
> Django. As I said you have separate WSGI script files, so not sure if it is 
> the same code base.
> 
> If you haven't watched them already, would suggest you watch:
> 
> http://lanyrd.com/2012/pycon/spcdg/
> http://lanyrd.com/2013/pycon/scdyzk/
> These mention the funnelling issue with daemon mode and give other tips on 
> tuning.
> 
> Graham
> 
> On 20/02/2014, at 12:47 PM, Mario Adrian Lopez Aleman <[email protected]> 
> wrote:
> 
>> Sorry, I have been working in that.
>> I have two VirtualHosts.
>> 
>> <VirtualHost *:443>
>>      SSLEngine On
>>      SSLCompression Off
>>      SSLProtocol +TLSv1 +SSLv3
>>      SSLHonorCipherOrder On
>>      SSLCipherSuite 
>> ECDHE-RSA-AES128-SHA256:AES128-GCM-SHA256:HIGH:!MD5:!aNULL:!EDH
>>      SSLCertificateFile      
>> /var/lib/application/.ssl/ssl-cert-application.pem
>>      SSLCertificateKeyFile 
>> /var/lib/application/.ssl/ssl-cert-application-pk.pem
>>      SSLCertificateChainFile 
>> /var/lib/application/.ssl/ssl-cert-application-ca.pem
>>      ServerName ztubuntuxlarge
>>      ServerAlias application
>>      ServerAdmin webmaster@ztubuntuxlarge
>> 
>>      WSGIApplicationGroup %{GLOBAL}
>>      WSGIDaemonProcess application display-name=%{GROUP}
>>      WSGIProcessGroup application
>> 
>>      WSGIScriptAlias / /usr/share/application-server/www/index.wsgi
>> 
>>      <Directory /usr/share/application-server/www>
>>              Order allow,deny
>>              Allow from all
>>      </Directory>
>> 
>>      Alias /api/ /usr/share/doc/application-docs/html/
>>      <Directory /usr/share/doc/application-docs/html/>
>>              AuthUserFile /var/lib/application/.htpasswd-api
>>              AuthName "Authorization Required"
>>              AuthType Basic
>>              require valid-user
>>              Order allow,deny
>>              Allow from all
>>      </Directory>
>> 
>>      ErrorLog /var/log/application/webapp.log
>>      LogLevel info
>>      CustomLog /var/log/application/access.log combined
>> 
>> </VirtualHost>
>> 
>> <VirtualHost *:80>
>>      ServerName ztubuntuxlarge
>>      ServerAlias hkp
>>      ServerAdmin webmaster@ztubuntuxlarge
>> 
>>      WSGIApplicationGroup %{GLOBAL}
>>      WSGIDaemonProcess hkp display-name=%{GROUP}
>>      WSGIProcessGroup hkp
>> 
>>      WSGIScriptAlias / /usr/share/application-server/www/hkp.wsgi
>> 
>>      <Directory /usr/share/application-server/www>
>>              Order allow,deny
>>              Allow from all
>>      </Directory>
>> 
>>      LogLevel info
>>      ErrorLog /var/log/application/hkp.log
>> </VirtualHost>
>> 
>> 
>> Apache config:
>> KeepAlive On
>> KeepAliveTimeout 1200
>> MaxKeepAliveRequests 0
>> 
>> I modified my configuration in MPM
>> <IfModule mpm_worker_module>
>>     ThreadLimit          8000
>>     ServerLimit          40
>>     StartServers         40
>>     MaxClients           4000
>>     MinSpareThreads      2000
>>     MaxSpareThreads      2500
>>     ThreadsPerChild      100
>>     MaxRequestsPerChild   0
>> </IfModule>
>> 
>> 
>> I'm not using a common testing tool.
>> I have a Python script that creates 500 clients in 4 machines (2000 clients).
>> 
>> The server usually connects with a client, and this client can upload and 
>> download files.
>> In this particular test I execute a registration (Involves a download and 
>> upload) and after that it downloads a shared file.
>> 
>> I have been using up to 2000 concurrent clients right now, and everything 
>> seems to work fine (of 2000 just 11 failed).
>> The actual response time is of 1 hour with all the 2000 clients.
>> (From the registration to the get, it takes 1 hour to finish, so each client 
>> takes around 50 to 60 minutes)
>> 
>> If I'm not clear in something please let me know.
>> 
>> 
>> El miércoles, 19 de febrero de 2014 19:13:01 UTC-6, Graham Dumpleton 
>> escribió:
>> Can you please provide the rest of your mod_wsgi configuration.
>> 
>> You do not show what you have set for WSGIDaemonProcess, nor enough about 
>> the structure of stuff in your VirtualHost to validate that things are set 
>> up correct to delegate requests to that daemon process group.
>> 
>> For the Apache configuration, provide what you are using for KeepAlive, 
>> KeepAliveTimeout and KeepAliveRequests directives.
>> 
>> Also provide information about how long your average and worst case response 
>> times typically are for your web application. Plus details on what load 
>> testing tool you are using and what options you are using with that in 
>> regard to concurrency, use of keep alive etc.
>> 
>> Right now your MPM configuration settings seem to be excessively high.
>> 
>> Graham
>> 
>> On 20/02/2014, at 5:29 AM, Mario Adrian Lopez Aleman <[email protected]> 
>> wrote:
>> 
>>> Hi everyone.
>>> 
>>> I'm using a big server to run my site and I'm having issues when trying to 
>>> do some performance tests in the server.
>>> Here are the specs of the server:
>>> Ubuntu 12.04.3 x64 3.2.0-55-virtual
>>> 60GB RAM
>>> Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (16 cores)
>>> 
>>> Python 2.7.3
>>> 
>>> libapache2-mod-wsgi              3.3-4build1
>>> apache2                                  2.2.22-1ubuntu1.4
>>> apache2-mpm-worker             2.2.22-1ubuntu1.4
>>> 
>>> Server version: Apache/2.2.22 (Ubuntu)
>>> Server built:   Jul 12 2013 13:37:15
>>> Server's Module Magic Number: 20051115:30
>>> Server loaded:  APR 1.4.6, APR-Util 1.3.12
>>> Compiled using: APR 1.4.6, APR-Util 1.3.12
>>> Architecture:   64-bit
>>> Server MPM:     Worker
>>>   threaded:     yes (fixed thread count)
>>>     forked:     yes (variable process count)
>>> Server compiled with....
>>>  -D APACHE_MPM_DIR="server/mpm/worker"
>>>  -D APR_HAS_SENDFILE
>>>  -D APR_HAS_MMAP
>>>  -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
>>>  -D APR_USE_SYSVSEM_SERIALIZE
>>>  -D APR_USE_PTHREAD_SERIALIZE
>>>  -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
>>>  -D APR_HAS_OTHER_CHILD
>>>  -D AP_HAVE_RELIABLE_PIPED_LOGS
>>>  -D DYNAMIC_MODULE_LIMIT=128
>>>  -D HTTPD_ROOT="/etc/apache2"
>>>  -D SUEXEC_BIN="/usr/lib/apache2/suexec"
>>>  -D DEFAULT_PIDLOG="/var/run/apache2.pid"
>>>  -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
>>>  -D DEFAULT_ERRORLOG="logs/error_log"
>>>  -D AP_TYPES_CONFIG_FILE="mime.types"
>>>  -D SERVER_CONFIG_FILE="apache2.conf
>>> 
>>> <IfModule mpm_worker_module>
>>>     ServerLimit          900
>>>     StartServers         900
>>>     MinSpareThreads      100
>>>     MaxSpareThreads      500
>>>     ThreadLimit          6000
>>>     ThreadsPerChild      32
>>>     MaxClients           28800
>>>     MaxRequestsPerChild   0
>>> </IfModule>
>>> 
>>> www-data limits
>>> time(seconds)        unlimited
>>> file(blocks)         unlimited
>>> data(kbytes)         unlimited
>>> stack(kbytes)        8192
>>> coredump(blocks)     0
>>> memory(kbytes)       unlimited
>>> locked memory(kbytes) 64
>>> process              unlimited
>>> nofiles              65535
>>> vmemory(kbytes)      unlimited
>>> locks                unlimited
>>> 
>>> The problem I'm seeing in my logs are these messages:
>>> [Wed Feb 19 03:31:04 2014] [error] [client 192.168.6.17] (2)No such file or 
>>> directory: mod_wsgi (pid=11996): Unable to connect to WSGI daemon process 
>>> 'ztrustee' on '/var/run/apache2/wsgi.1342.0.1.sock' after multiple attempts.
>>> [Wed Feb 19 03:31:04 2014] [error] [client 192.168.6.16] Premature end of 
>>> script headers: index.wsgi
>>> [Wed Feb 19 18:01:02 2014] [error] [client 192.168.6.8] (4)Interrupted 
>>> system call: mod_wsgi (pid=29224): Unable to connect to WSGI daemon process 
>>> 'ztrustee' on '/var/run/apache2/wsgi.1331.0.1.sock' after multiple attempts.
>>> 
>>> I already added this directive:
>>> WSGIApplicationGroup %{GLOBAL}
>>> 
>>> And also checked the permission/prefix problem in:
>>> /var/run/apache2
>>> 
>>> 
>>> These problems appear when doing performance tests (using 1000 ~ 1500 
>>> clients), however the load in the server is not that much.
>>> MAX RAM ~5 G
>>> MAX CPU ~200% (MAX is 1600%)
>>> 
>>> And the clients are seeing this message:
>>>     500 Internal Server Error
>>>     Internal Server Error
>>>         The server encountered an internal error or
>>>          misconfiguration and was unable to complete
>>>          your request.
>>>      Please contact the server administrator,
>>>      webmaster@ztubuntuxlarge and inform them of the time the error 
>>> occurred,
>>>      and anything you might have done that may have
>>>      caused the error.
>>> 
>>>     More information about this error may be available
>>>     in the server error log.
>>>     Apache/2.2.22 (Ubuntu) Server at 192.168.6.13 Port 443
>>> 
>>> Hopefully someone can point me in the right direction.
>>> Regards!
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "modwsgi" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/modwsgi.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/modwsgi.
>> For more options, visit https://groups.google.com/groups/opt_out.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/modwsgi.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [modwsgi] High capacity server with errors in logs.

Reply via email to