It's running mod_wsgi 4.3.0, not the version you gave me.

On Monday, November 3, 2014 7:25:38 PM UTC+8, Kelvin Wong wrote:
>
> I obtained log messages on production when the error happened. The log 
> level is set to 'info'.
>
> [Mon Nov 03 11:11:14.947304 2014] [wsgi:info] [pid 4478:tid 
> 139644458678144] mod_wsgi (pid=4478): Shutdown requested 'site-2'.
> [Mon Nov 03 11:11:14.947499 2014] [wsgi:info] [pid 4478:tid 
> 139644458678144] mod_wsgi (pid=4478): Stopping process 'site-2'.
> [Mon Nov 03 11:11:14.947521 2014] [wsgi:info] [pid 4478:tid 
> 139644458678144] mod_wsgi (pid=4478): Destroying interpreters.
> [Mon Nov 03 11:11:14.947536 2014] [wsgi:info] [pid 4478:tid 
> 139644458678144] mod_wsgi (pid=4478): Cleanup interpreter ''.
> [Mon Nov 03 11:11:14.951446 2014] [wsgi:info] [pid 4477:tid 
> 139644458678144] mod_wsgi (pid=4477): Shutdown requested 'site-1'.
> [Mon Nov 03 11:11:14.951672 2014] [wsgi:info] [pid 4477:tid 
> 139644458678144] mod_wsgi (pid=4477): Stopping process 'site-1'.
> [Mon Nov 03 11:11:14.951693 2014] [wsgi:info] [pid 4477:tid 
> 139644458678144] mod_wsgi (pid=4477): Destroying interpreters.
> [Mon Nov 03 11:11:14.951708 2014] [wsgi:info] [pid 4477:tid 
> 139644458678144] mod_wsgi (pid=4477): Cleanup interpreter ''.
> [Mon Nov 03 11:11:14.955690 2014] [wsgi:info] [pid 4476:tid 
> 139644458678144] mod_wsgi (pid=4476): Shutdown requested 'site-3'.
> [Mon Nov 03 11:11:14.955911 2014] [wsgi:info] [pid 4476:tid 
> 139644458678144] mod_wsgi (pid=4476): Stopping process 'site-3'.
> [Mon Nov 03 11:11:14.955933 2014] [wsgi:info] [pid 4476:tid 
> 139644458678144] mod_wsgi (pid=4476): Destroying interpreters.
> [Mon Nov 03 11:11:14.955947 2014] [wsgi:info] [pid 4476:tid 
> 139644458678144] mod_wsgi (pid=4476): Cleanup interpreter ''.
> [Mon Nov 03 11:11:14.967577 2014] [wsgi:info] [pid 4476:tid 
> 139644458678144] mod_wsgi (pid=4476): Terminating Python.
> [Mon Nov 03 11:11:14.970224 2014] [wsgi:info] [pid 4476:tid 
> 139644458678144] mod_wsgi (pid=4476): Python has shutdown.
> [Mon Nov 03 11:11:14.970246 2014] [wsgi:info] [pid 4476:tid 
> 139644458678144] mod_wsgi (pid=4476): Exiting process 'site-3'.
> [Mon Nov 03 11:11:14.975368 2014] [wsgi:info] [pid 4477:tid 
> 139644458678144] mod_wsgi (pid=4477): Terminating Python.
> [Mon Nov 03 11:11:14.978051 2014] [wsgi:info] [pid 4477:tid 
> 139644458678144] mod_wsgi (pid=4477): Python has shutdown.
> [Mon Nov 03 11:11:14.978072 2014] [wsgi:info] [pid 4477:tid 
> 139644458678144] mod_wsgi (pid=4477): Exiting process 'site-1'.
> [Mon Nov 03 11:11:14.983240 2014] [wsgi:info] [pid 4478:tid 
> 139644458678144] mod_wsgi (pid=4478): Terminating Python.
> [Mon Nov 03 11:11:14.986019 2014] [wsgi:info] [pid 4478:tid 
> 139644458678144] mod_wsgi (pid=4478): Python has shutdown.
> [Mon Nov 03 11:11:14.986041 2014] [wsgi:info] [pid 4478:tid 
> 139644458678144] mod_wsgi (pid=4478): Exiting process 'site-2'.
> [Mon Nov 03 11:11:18.985926 2014] [wsgi:error] [pid 1712:tid 
> 140701685757696] [client xx.xx.xx.xx:13183] Timeout when reading response 
> headers from daemon process 'site-1': 
> /home/ubuntu/snaptee-production/apache/wsgi.py
> [Mon Nov 03 11:11:57.511176 2014] [wsgi:error] [pid 2417:tid 
> 140701627008768] [client xx.xx.xx.xx:9079] Timeout when reading response 
> headers from daemon process 'site-1': 
> /home/ubuntu/snaptee-production/apache/wsgi.py
> [Mon Nov 03 11:11:59.504832 2014] [wsgi:error] [pid 2026:tid 
> 140701568259840] [client xx.xx.xx.xx:13194] Timeout when reading response 
> headers from daemon process 'site-1': 
> /home/ubuntu/snaptee-production/apache/wsgi.py
> [Mon Nov 03 11:12:02.893114 2014] [wsgi:error] [pid 1712:tid 
> 140701593437952] [client xx.xx.xx.xx:9123] Timeout when reading response 
> headers from daemon process 'site-1': 
> /home/ubuntu/snaptee-production/apache/wsgi.py
> [Mon Nov 03 11:12:15.471793 2014] [wsgi:error] [pid 2026:tid 
> 140701643794176] [client xx.xx.xx.xx:9145] Timeout when reading response 
> headers from daemon process 'site-1': 
> /home/ubuntu/snaptee-production/apache/wsgi.py
> [Mon Nov 03 11:12:18.741610 2014] [wsgi:error] [pid 1712:tid 
> 140701840639744] [client xx.xx.xx.xx:13230] Timeout when reading response 
> headers from daemon process 'site-1': 
> /home/ubuntu/snaptee-production/apache/wsgi.py
> [Mon Nov 03 11:19:32.470117 2014] [wsgi:error] [pid 1712:tid 
> 140701840639744] [client xx.xx.xx.xx:9723] Timeout when reading response 
> headers from daemon process 'site-1': 
> /home/ubuntu/snaptee-production/apache/wsgi.py
> [Mon Nov 03 11:19:33.336583 2014] [wsgi:error] [pid 2417:tid 
> 140701807068928] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:10126] mod_wsgi (pid=2417): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:36.421071 2014] [wsgi:error] [pid 1712:tid 
> 140701660579584] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:10148] mod_wsgi (pid=1712): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:38.102948 2014] [wsgi:error] [pid 1712:tid 
> 140701677364992] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:10157] mod_wsgi (pid=1712): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:38.186446 2014] [wsgi:error] [pid 1712:tid 
> 140701685757696] [client xx.xx.xx.xx:13868] Timeout when reading response 
> headers from daemon process 'site-1': 
> /home/ubuntu/snaptee-production/apache/wsgi.py
> [Mon Nov 03 11:19:42.206195 2014] [wsgi:error] [pid 1712:tid 
> 140701823854336] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:14302] mod_wsgi (pid=1712): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:42.366307 2014] [wsgi:error] [pid 3039:tid 
> 140701568259840] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:14306] mod_wsgi (pid=3039): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:46.209203 2014] [wsgi:error] [pid 2417:tid 
> 140701593437952] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:14308] mod_wsgi (pid=2417): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:47.026183 2014] [wsgi:error] [pid 2026:tid 
> 140701677364992] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:14312] mod_wsgi (pid=2026): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:49.092478 2014] [wsgi:error] [pid 1712:tid 
> 140701652186880] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:10163] mod_wsgi (pid=1712): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:49.505943 2014] [wsgi:error] [pid 2417:tid 
> 140701840639744] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:14314] mod_wsgi (pid=2417): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:49.692284 2014] [wsgi:error] [pid 2417:tid 
> 140701719328512] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:10173] mod_wsgi (pid=2417): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:49.725615 2014] [wsgi:error] [pid 2026:tid 
> 140701568259840] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:14316] mod_wsgi (pid=2026): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:52.273296 2014] [wsgi:error] [pid 2026:tid 
> 140701685757696] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:10175] mod_wsgi (pid=2026): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:53.356815 2014] [wsgi:error] [pid 2026:tid 
> 140701823854336] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:14318] mod_wsgi (pid=2026): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:57.364586 2014] [wsgi:error] [pid 2026:tid 
> 140701585045248] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:10178] mod_wsgi (pid=2026): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:57.620572 2014] [wsgi:error] [pid 2417:tid 
> 140701652186880] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:14324] mod_wsgi (pid=2417): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:01.508145 2014] [wsgi:error] [pid 2026:tid 
> 140701719328512] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:14331] mod_wsgi (pid=2026): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:03.685263 2014] [wsgi:error] [pid 3039:tid 
> 140701576652544] [client xx.xx.xx.xx:9777] Timeout when reading response 
> headers from daemon process 'site-1': 
> /home/ubuntu/snaptee-production/apache/wsgi.py
> [Mon Nov 03 11:20:07.131191 2014] [wsgi:error] [pid 2026:tid 
> 140701610223360] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:14337] mod_wsgi (pid=2026): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:07.210243 2014] [wsgi:error] [pid 2026:tid 
> 140701601830656] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:10185] mod_wsgi (pid=2026): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:10.990187 2014] [wsgi:error] [pid 2417:tid 
> 140701660579584] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:10206] mod_wsgi (pid=2417): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:12.343626 2014] [wsgi:error] [pid 2026:tid 
> 140701807068928] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:14353] mod_wsgi (pid=2026): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:12.383353 2014] [wsgi:error] [pid 2417:tid 
> 140701601830656] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:10210] mod_wsgi (pid=2417): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:12.514056 2014] [wsgi:error] [pid 2026:tid 
> 140701568259840] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:10213] mod_wsgi (pid=2026): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:13.106436 2014] [wsgi:error] [pid 2026:tid 
> 140701677364992] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:14356] mod_wsgi (pid=2026): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:13.367651 2014] [wsgi:error] [pid 3039:tid 
> 140701668972288] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:10219] mod_wsgi (pid=3039): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:17.423692 2014] [wsgi:error] [pid 2417:tid 
> 140701610223360] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:14359] mod_wsgi (pid=2417): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:17.450428 2014] [wsgi:error] [pid 3039:tid 
> 140701815461632] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:10221] mod_wsgi (pid=3039): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:20.733221 2014] [wsgi:info] [pid 3794:tid 
> 140056943240960] mod_wsgi (pid=3794): Maximum requests reached, triggering 
> immediate shutdown 'site-1'.
> [Mon Nov 03 11:20:20.733323 2014] [wsgi:info] [pid 3794:tid 
> 140057235269504] mod_wsgi (pid=3794): Shutdown requested 'site-1'.
> [Mon Nov 03 11:20:20.733511 2014] [wsgi:info] [pid 3794:tid 
> 140057235269504] mod_wsgi (pid=3794): Stopping process 'site-1'.
> [Mon Nov 03 11:20:20.733528 2014] [wsgi:info] [pid 3794:tid 
> 140057235269504] mod_wsgi (pid=3794): Destroying interpreters.
> [Mon Nov 03 11:20:20.733538 2014] [wsgi:info] [pid 3794:tid 
> 140057235269504] mod_wsgi (pid=3794): Cleanup interpreter ''.
> [Mon Nov 03 11:20:20.734308 2014] [wsgi:error] [pid 3794:tid 
> 140057235269504] 2014-11-03 11:20:20,734 (3794/Dummy-7) newrelic.core.agent 
> INFO - New Relic Python Agent Shutdown
> [Mon Nov 03 11:20:20.680893 2014] [wsgi:error] [pid 1712:tid 
> 140701710935808] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:10226] mod_wsgi (pid=1712): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.1237.0.4.sock'.
>
> Please advise. Thanks.
>
>
> On Monday, November 3, 2014 6:43:19 PM UTC+8, Graham Dumpleton wrote:
>
> On 03/11/2014, at 9:35 PM, Kelvin Wong <[email protected]> wrote:
>
> I don't think the original error situation is caused by server overloading 
> as the traffic of the server on that time it's fine.
>
>
> I didn't say it was.
>
> I just want to rule out some conditions as returning 503 can be caused by 
> overloading the service.
>
> Overloading the servers can produce 503 but the error log messages do not 
> match with the original error log message.
>
>
> And I am only addressing the message:
>
> 2014-11-03 07:47:01.591 [Mon Nov 03 07:46:54.109098 2014] [wsgi:error] 
> [pid 4615:tid 140372718073600] (11)Resource temporarily unavailable: 
> [client xx.xx.xx.xx:26793] mod_wsgi (pid=4615): Unable to connect to WSGI 
> daemon process 'site-1' on '/var/run/apache2/wsgi.3387.0.4.sock'.
>
> That area of the code was rewritten in 4.X versions at some point and 
> introduced non blocking connects. Looking more at that specific message and 
> knowing the possibility that you were load testing I came across that 
> difference in Linux behaviour compared to what POSIX specifies.
>
> So all I can do is address that specific point and that it is all I can do 
> until I can get those info level messages from the error logs so I can 
> understand whether the mod_wsgi daemon processes are actually restarting at 
> the same time.
>
> Graham
>
>
> On Monday, November 3, 2014 5:58:47 PM UTC+8, Graham Dumpleton wrote:
>
> I still want any log messages either side of that point as it may confirm 
> what is going on, but what you might be encountering could be due to Linux 
> behaviour when performing a connect() on a UNIX socket in non blocking mode 
> not being POSIX compliant. Details:
>
> http://lists.mysql.com/internals/35082
>
> The mod_wsgi code is possibly not dealing with Linux's non POSIX 
> compliance and instead of interpreting it the same as connection refused 
> (when the listener socket backlog is full), it is regarding it as an 
> outright failure.
>
> So the cause of this if you are trying to do load testing is that you are 
> overwhelming the capacity of the mod_wsgi daemon processes causing the 
> socket listener queue to fill up. When it fills up then it is triggering 
> the non POSIX behaviour of Linux.
>
> Can you confirm that you only saw this when you overwhelmed the server 
> with more traffic that you would realistically expect it to receive and way 
> beyond the capacity it could handle based on how it is configured?
>
> Graham
>
> On 03/11/2014, at 8:33 PM, Graham Dumpleton <[email protected]> wrote:
>
> As I said before, I need to see messages either side of the event and with 
> Apache log level set to info.
>
> Graham
>
> On 03/11/2014, at 8:28 PM, Kelvin Wong <[email protected]> wrote:
>
> I tried some load testing on the server to see if it is caused by 
> overloading.
>
> I found there will be some other log message when the server is 
> overloading.
> 2014-11-03 07:47:01.591 [Mon Nov 03 07:46:53.795864 2014] [wsgi:error] 
> [pid 3724:tid 140372768429824] [client xx.xx.xx.xx:26561] Truncated or 
> oversized response headers received from daemon process 'site-1': 
> /home/ubuntu/site-1/apache/wsgi.py
> 2014-11-03 07:47:01.591 [Mon Nov 03 07:46:54.109098 2014] [wsgi:error] 
> [pid 4615:tid 140372718073600] (11)Resource temporarily unavailable: 
> [client xx.xx.xx.xx:26793] mod_wsgi (pid=4615): Unable to connect to WSGI 
> daemon process 'site-1' on '/var/run/apache2/wsgi.3387.0.4.sock'.
>
> Then I concluded that the situation is not caused by overloading the 
> server.
>
> Do you have any directions for me to reproduce the situation?
>
> Thanks.
>
>
> On Monday, November 3, 2014 11:09:33 AM UTC+8, Kelvin Wong wrote:
>
> I found no cron is running on root and user.
>
> According to the logrotate config, the log rotate weekly.
> But the error situation is likely happened nightly.
>
> So I think it will not be my concern.
>
>
> On Monday, November 3, 2014 11:00:10 AM UTC+8, Graham Dumpleton wrote:
>
> And when is 'cron' configured to run logrotate? Does it correspond to when 
> you are seeing the issue?
>
> On 03/11/2014, at 1:56 PM, Kelvin Wong <[email protected]> wrote:
>
> Yes, the memory usage is under control now and I increased 
> maximum-requests by 4 times.
>
> For the error, 
> Timeout when reading response headers from daemon process 'site-1': /home/
> ubuntu/site-1/apache/wsgi.py
> came out as sudden, that's why I capture this message only. I also found 
> main Apache error log nothing special.
>
> Now, I turned the log level to 'info' in order to obtain more messages.
>
> About the log file rotation, I only found the ubuntu default rotation one 
> which is using logrotate.
> /var/log/apache2/*.log {
>         weekly
>         missingok
>         rotate 52
>         compress
>         delaycompress
>         notifempty
>         create 640 root adm
>         sharedscripts
>         postrotate
>                 if /etc/init.d/apache2 status > /dev/null ; then \
>                     /etc/init.d/apache2 reload > /dev/null; \
>                 fi;
>         endscript
>         prerotate
>                 if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
>                         run-parts /etc/logrotate.d/httpd-prerotate; \
>                 fi; \
>         endscript
> }
> It will do Apache reload on postrotate. I research a little bit, the 
> reload is needed for Apache handling file.
> Some suggested "copytruncate" on the config to avoid reload. Will you also 
> suggest that?
>
>
> Thanks.
>
>
> On Monday, November 3, 2014 8:00:10 AM UTC+8, Graham Dumpleton wrote:
>
>
> On 02/11/2014, at 2:18 AM, Kelvin Wong <[email protected]> wrote:
>
>
> I have some updates on the issue as it happened again. This time I 
> captured some data from New Relic.
>
> From the Requests.png, I found that there is an spike aroung 22:23 and 
> check the Apache log. I found there are many as following.
> Timeout when reading response headers from daemon process 'site-1': /home/
> ubuntu/site-1/apache/wsgi.py
>
> Spike also found on the Workers.png during that time period. The graph is 
> showing most of the worker is doing write?
> After a few minute, the following log appear again, and start returning 
> 503.
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:xxxxx] mod_wsgi 
> (pid=27816): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.17227.2.3.sock'.
>
> Can you tell what happened from those graphs?
>
>
> What I need to see is more of the log file around those events.
>
> Ensure you have LogLevel set to 'info' in the Apache configuration as 
> mod_wsgi will then log more messages about why processes restarted.
>
> Also ensure you are checking for message around the time in the global 
> Apache error log and not just in any virtual host specific error log, as 
> some messages, such as evidence of a restart or process crashes will only 
> appear in the main Apache error log.
>
> With memory growth under control:
>
>
> turning to look at daemon process restarts:
>
>
> they are obviously more frequent. Zooming in, each process is restarting 
> about every 5 minutes, although varies over time.
>
> I would suggest you can increase maximum-requests, possibly as high as 4 
> times what you are using.
>
> When enabling 'info' logging that will reduce the amount of noise in the 
> error logs which may hide the specific anomalous event you are trying to 
> track down.
>
> BTW. Did you look into whether any log file rotation is set up or not and 
> what did you find?
>
> Depending on the system you might be able to find whether something is 
> running by looking at one cron jobs are running for root.
>
>     sudo crontab -l
>
> Graham
>
>
> Thanks.
>
>
> On Friday, October 31, 2014 5:08:27 PM UTC+8, Graham Dumpleton wrote:
>
>
> On 31/10/2014, at 7:26 PM, Kelvin Wong <[email protected]> wrote:
>
> Oh. I see. Thanks.
>
> One more question.
> As I set the graceful-timeout on mod_wsgi process, then when Apache do 
> graceful restart, will mod_wsgi obey that timeout setting?
>
>
> No. It is impossible. There is simply no way for the mod_wsgi daemon 
> processes to act in coordination with the Apache child worker processes on 
> an Apache graceful restart.
>
> This is because the only way that mod_wsgi can create its distinct child 
> processes and have them managed by Apache at all is using a mechanism that 
> in the Apache runtime library is referred to as a 'other processes'.
>
> It doesn't matter whether Apache as a whole is signalled for a normal 
> shutdown (SIGHUP) or a graceful restart (SIGUSR1), Apache will in turn 
> always send a SIGINT on to 'other processes' such as those created by 
> mod_wsgi. If those 'other processes' don't shutdown immediately it will 
> signal them again at one second intervals until 3 seconds have elapsed, at 
> which point it will send them a SIGKILL. So Apache only supports a brutal 
> shutdown of that sort for 'other processes'.
>
> For its own child worker processes, to even support a graceful restart is 
> a non trivial thing. It isn't just a matter of sending the child worker 
> processes a SIGUSR1. Instead there is an elaborate scheme involving what is 
> called a pipe of death between the Apache parent and the child worker 
> processes, as well as shared state via shared memory between the Apache 
> parent process so the child worker processes now what sort of shutdown has 
> been triggered. That sort of mechanism to support a graceful shutdown for 
> 'other processes' simply does exist.
>
> So it is how it is and there is no way to change things. In practice it 
> hasn't caused any issues and web requests being interrupted is always a 
> risk that a web application has to deal with anyway.
>
> Graham
>
> On Friday, October 31, 2014 4:16:55 PM UTC+8, Graham Dumpleton wrote:
>
>
>
> On 31/10/2014, at 6:53 PM, Kelvin Wong <[email protected]> wrote:
>
> You means that the OS triggered Apache graceful restart? If so, why the OS 
> do that?
>
>
> Yes and I just told you why, as part of a log rotation mechanism.
>
> Try googling 'log file rotation' and see what mechanism is used on your 
> specific Linux system.
>
> One link about one such solution which is often used is:
>
> http://www.cyberciti.biz/faq/how-do-i-rotate-log-files/
>
> Apache does itself have a way of handling log rotation itself, but likely 
> ignored by Linux distros which use their own preferred mechanism.
>
> Graham
>
>
> On Friday, October 31, 2014 3:45:35 PM UTC+8, Graham Dumpleton wrote:
>
> Your operating system could be signalling Apache during the middle of the 
> night so that the log files can be rotated.
>
> Graham
>
> On 31/10/2014, at 6:36 PM, Kelvin Wong <[email protected]> wrote:
>
> I checked the memory of overall Apache process, it goes up and down a 
> little. It Seemed the "maximum-requests" is doing its job.
>
> I found that when I do "sudo service apache reload", Apache will do 
> graceful restart as it shown in main Apache error log.
>
>
> [Sun Oct 26 07:34:37.198806 2014] [wsgi:error] [pid 27816:tid 
> 140052910765824] (11)Resource temporarily unavailable: [client xx.xx.xx.xx
> :xxxxx] mod_wsgi (pid=27816): Unable to connect to WSGI daemon process 
> 'site-1' on '/var/run/apache2/wsgi.17227.2.3.sock'.
> When before above situation happened, Apache also doing graceful restart.
> But this should not be triggered by anyone as I mentioned it happened 
> during sleep time.
> Is there any reasons that Apache do graceful restart by itself?
>
> Thanks.
>
>
> On Friday, October 31, 2014 2:55:32 PM UTC+8, Graham Dumpleton wrote:
>
> BTW, in your NR agent configuration file find the 
> error_collector.ignore_errors setting and change it to:
>
> error_collector.ignore_errors = django.http.request:UnreadablePostError
>
> These indicate the HTTP client connection dropping before all content had 
> been sent.
>
> Generally you want to ignore these and eliminate them from your error 
> count as they do not represent an error in your application but are the 
> result of a client problem.
>
> Adding that configuration setting will cause them to be ignored and not 
> recorded as an error.
>
> Graham
>
> On 31/10/2014, at 5:41 PM, Graham Dumpleton <[email protected]> wrote:
>
>
> On 31/10/2014, at 5:04 PM, Kelvin Wong <[email protected]> wrote:
>
> Thanks for the details!
>
> I now have a better picture on how it works.
> I employed all the recommendations you suggested. I will keep monitoring 
> the processes.
>
> But how about the graceful restart signaled by Apache? Why it happened?
>
>
> As far as the message:
>
> [Tue Oct 28 02:20:40.722140 2014] [wsgi:error] [pid 24158:tid 
> 140182690981632] (2)No such 
> file or directory: [client 24.171.250.159:60769] mod_wsgi (pid=24158): Unable 
> to 
> connect to WSGI daemon 
> process 'snaptee-production-api-ssl' on'/var/run/apache2/wsgi.30188.7.3.sock'.
>
> goes, as I said I believe that could only be the result of an Apache 
> graceful restart.
>
> My recollection is that a 'reload' under a Linux system using its process 
> management tools triggers an Apache graceful restart where as a 'restart' 
> does a normal Apache restart.
>
> I believe you said you used 'reload' and so that would be the source of 
> that.
>
> The whole reason though I have been trying to sort out what processes 
> actually had memory issues though was to avoid MaxConnectionsPerChild as 
> well to ensure that that wasn't a source of the problem as well.
>
> Now that we have you on a better configuration lets see now whether it 
> happens again, rather than worrying too much about what happened in the 
> past. Just note when you happen to do a 'reload', and so understand that if 
> seen at that time that that was likely the cause.
>
> Once things settle down for a while then can look at your 
> processes/threads configuration and capacity utilisation and so work out 
> whether a poor configuration there is causing prob
>
> ...

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to