Can you supply the main Apache error log for the same period of time? Not all 
information is in the virtual host specific error log, including key 
information which captures when an Apache restart was triggered.

Based on the incomplete information from just that log, it opens the 
possibility that the restart was actually the outcome of problems which arose 
and not the trigger. I really need that main Apache error log to be sure.

As well as supplying the main Apache error log, can you indicate whether your 
system is set up with any of these things.

1. Any sort of backup process, be it file system backups or database backups. 
If there are such backup services, what time of day do they run.

2. Any sort of intensive batch update processing which loads data into any 
database at a specific time of day.

3. Do you have any monitoring system which checks for the liveness of Apache 
and which would force a restart if it was unable to poll Apache and thus though 
it was hung.

Lack of a permalink from NR UI (bottom right of pages) makes it hard to find 
the specific time of day for the event you are talking about because of 
uncertainties of timezones, but I see evidence of various other strange things 
going on at times. Lets see those main Apache error logs first though and then 
can properly come up with a plausible explanation of what may have triggered 
things.

Graham

On 04/11/2014, at 2:53 AM, Kelvin Wong <[email protected]> wrote:

> I attached the error log. See it is enough.
> 
> 
> On Monday, November 3, 2014 8:10:00 PM UTC+8, Graham Dumpleton wrote:
> Can you provide further back in time and further forward in time.
> 
> Basically, from when everything looked normal until everything was back to 
> normal.
> 
> I get the feel that this has chopped off bits and not everything I might want 
> to see is there.
> 
> Sleep time here. Will then look tomorrow.
> 
> Graham
> 
> On 03/11/2014, at 10:25 PM, Kelvin Wong <[email protected]> wrote:
> 
> I obtained log messages on production when the error happened. The log level 
> is set to 'info'.
> 
> [Mon Nov 03 11:11:14.947304 2014] [wsgi:info] [pid 4478:tid 139644458678144] 
> mod_wsgi (pid=4478): Shutdown requested 'site-2'.
> [Mon Nov 03 11:11:14.947499 2014] [wsgi:info] [pid 4478:tid 139644458678144] 
> mod_wsgi (pid=4478): Stopping process 'site-2'.
> [Mon Nov 03 11:11:14.947521 2014] [wsgi:info] [pid 4478:tid 139644458678144] 
> mod_wsgi (pid=4478): Destroying interpreters.
> [Mon Nov 03 11:11:14.947536 2014] [wsgi:info] [pid 4478:tid 139644458678144] 
> mod_wsgi (pid=4478): Cleanup interpreter ''.
> [Mon Nov 03 11:11:14.951446 2014] [wsgi:info] [pid 4477:tid 139644458678144] 
> mod_wsgi (pid=4477): Shutdown requested 'site-1'.
> [Mon Nov 03 11:11:14.951672 2014] [wsgi:info] [pid 4477:tid 139644458678144] 
> mod_wsgi (pid=4477): Stopping process 'site-1'.
> [Mon Nov 03 11:11:14.951693 2014] [wsgi:info] [pid 4477:tid 139644458678144] 
> mod_wsgi (pid=4477): Destroying interpreters.
> [Mon Nov 03 11:11:14.951708 2014] [wsgi:info] [pid 4477:tid 139644458678144] 
> mod_wsgi (pid=4477): Cleanup interpreter ''.
> [Mon Nov 03 11:11:14.955690 2014] [wsgi:info] [pid 4476:tid 139644458678144] 
> mod_wsgi (pid=4476): Shutdown requested 'site-3'.
> [Mon Nov 03 11:11:14.955911 2014] [wsgi:info] [pid 4476:tid 139644458678144] 
> mod_wsgi (pid=4476): Stopping process 'site-3'.
> [Mon Nov 03 11:11:14.955933 2014] [wsgi:info] [pid 4476:tid 139644458678144] 
> mod_wsgi (pid=4476): Destroying interpreters.
> [Mon Nov 03 11:11:14.955947 2014] [wsgi:info] [pid 4476:tid 139644458678144] 
> mod_wsgi (pid=4476): Cleanup interpreter ''.
> [Mon Nov 03 11:11:14.967577 2014] [wsgi:info] [pid 4476:tid 139644458678144] 
> mod_wsgi (pid=4476): Terminating Python.
> [Mon Nov 03 11:11:14.970224 2014] [wsgi:info] [pid 4476:tid 139644458678144] 
> mod_wsgi (pid=4476): Python has shutdown.
> [Mon Nov 03 11:11:14.970246 2014] [wsgi:info] [pid 4476:tid 139644458678144] 
> mod_wsgi (pid=4476): Exiting process 'site-3'.
> [Mon Nov 03 11:11:14.975368 2014] [wsgi:info] [pid 4477:tid 139644458678144] 
> mod_wsgi (pid=4477): Terminating Python.
> [Mon Nov 03 11:11:14.978051 2014] [wsgi:info] [pid 4477:tid 139644458678144] 
> mod_wsgi (pid=4477): Python has shutdown.
> [Mon Nov 03 11:11:14.978072 2014] [wsgi:info] [pid 4477:tid 139644458678144] 
> mod_wsgi (pid=4477): Exiting process 'site-1'.
> [Mon Nov 03 11:11:14.983240 2014] [wsgi:info] [pid 4478:tid 139644458678144] 
> mod_wsgi (pid=4478): Terminating Python.
> [Mon Nov 03 11:11:14.986019 2014] [wsgi:info] [pid 4478:tid 139644458678144] 
> mod_wsgi (pid=4478): Python has shutdown.
> [Mon Nov 03 11:11:14.986041 2014] [wsgi:info] [pid 4478:tid 139644458678144] 
> mod_wsgi (pid=4478): Exiting process 'site-2'.
> [Mon Nov 03 11:11:18.985926 2014] [wsgi:error] [pid 1712:tid 140701685757696] 
> [client xx.xx.xx.xx:13183] Timeout when reading response headers from daemon 
> process 'site-1': /home/ubuntu/snaptee-production/apache/wsgi.py
> [Mon Nov 03 11:11:57.511176 2014] [wsgi:error] [pid 2417:tid 140701627008768] 
> [client xx.xx.xx.xx:9079] Timeout when reading response headers from daemon 
> process 'site-1': /home/ubuntu/snaptee-production/apache/wsgi.py
> [Mon Nov 03 11:11:59.504832 2014] [wsgi:error] [pid 2026:tid 140701568259840] 
> [client xx.xx.xx.xx:13194] Timeout when reading response headers from daemon 
> process 'site-1': /home/ubuntu/snaptee-production/apache/wsgi.py
> [Mon Nov 03 11:12:02.893114 2014] [wsgi:error] [pid 1712:tid 140701593437952] 
> [client xx.xx.xx.xx:9123] Timeout when reading response headers from daemon 
> process 'site-1': /home/ubuntu/snaptee-production/apache/wsgi.py
> [Mon Nov 03 11:12:15.471793 2014] [wsgi:error] [pid 2026:tid 140701643794176] 
> [client xx.xx.xx.xx:9145] Timeout when reading response headers from daemon 
> process 'site-1': /home/ubuntu/snaptee-production/apache/wsgi.py
> [Mon Nov 03 11:12:18.741610 2014] [wsgi:error] [pid 1712:tid 140701840639744] 
> [client xx.xx.xx.xx:13230] Timeout when reading response headers from daemon 
> process 'site-1': /home/ubuntu/snaptee-production/apache/wsgi.py
> [Mon Nov 03 11:19:32.470117 2014] [wsgi:error] [pid 1712:tid 140701840639744] 
> [client xx.xx.xx.xx:9723] Timeout when reading response headers from daemon 
> process 'site-1': /home/ubuntu/snaptee-production/apache/wsgi.py
> [Mon Nov 03 11:19:33.336583 2014] [wsgi:error] [pid 2417:tid 140701807068928] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:10126] mod_wsgi 
> (pid=2417): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:36.421071 2014] [wsgi:error] [pid 1712:tid 140701660579584] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:10148] mod_wsgi 
> (pid=1712): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:38.102948 2014] [wsgi:error] [pid 1712:tid 140701677364992] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:10157] mod_wsgi 
> (pid=1712): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:38.186446 2014] [wsgi:error] [pid 1712:tid 140701685757696] 
> [client xx.xx.xx.xx:13868] Timeout when reading response headers from daemon 
> process 'site-1': /home/ubuntu/snaptee-production/apache/wsgi.py
> [Mon Nov 03 11:19:42.206195 2014] [wsgi:error] [pid 1712:tid 140701823854336] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:14302] mod_wsgi 
> (pid=1712): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:42.366307 2014] [wsgi:error] [pid 3039:tid 140701568259840] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:14306] mod_wsgi 
> (pid=3039): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:46.209203 2014] [wsgi:error] [pid 2417:tid 140701593437952] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:14308] mod_wsgi 
> (pid=2417): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:47.026183 2014] [wsgi:error] [pid 2026:tid 140701677364992] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:14312] mod_wsgi 
> (pid=2026): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:49.092478 2014] [wsgi:error] [pid 1712:tid 140701652186880] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:10163] mod_wsgi 
> (pid=1712): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:49.505943 2014] [wsgi:error] [pid 2417:tid 140701840639744] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:14314] mod_wsgi 
> (pid=2417): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:49.692284 2014] [wsgi:error] [pid 2417:tid 140701719328512] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:10173] mod_wsgi 
> (pid=2417): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:49.725615 2014] [wsgi:error] [pid 2026:tid 140701568259840] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:14316] mod_wsgi 
> (pid=2026): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:52.273296 2014] [wsgi:error] [pid 2026:tid 140701685757696] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:10175] mod_wsgi 
> (pid=2026): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:53.356815 2014] [wsgi:error] [pid 2026:tid 140701823854336] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:14318] mod_wsgi 
> (pid=2026): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:57.364586 2014] [wsgi:error] [pid 2026:tid 140701585045248] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:10178] mod_wsgi 
> (pid=2026): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:19:57.620572 2014] [wsgi:error] [pid 2417:tid 140701652186880] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:14324] mod_wsgi 
> (pid=2417): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:01.508145 2014] [wsgi:error] [pid 2026:tid 140701719328512] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:14331] mod_wsgi 
> (pid=2026): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:03.685263 2014] [wsgi:error] [pid 3039:tid 140701576652544] 
> [client xx.xx.xx.xx:9777] Timeout when reading response headers from daemon 
> process 'site-1': /home/ubuntu/snaptee-production/apache/wsgi.py
> [Mon Nov 03 11:20:07.131191 2014] [wsgi:error] [pid 2026:tid 140701610223360] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:14337] mod_wsgi 
> (pid=2026): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:07.210243 2014] [wsgi:error] [pid 2026:tid 140701601830656] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:10185] mod_wsgi 
> (pid=2026): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:10.990187 2014] [wsgi:error] [pid 2417:tid 140701660579584] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:10206] mod_wsgi 
> (pid=2417): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:12.343626 2014] [wsgi:error] [pid 2026:tid 140701807068928] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:14353] mod_wsgi 
> (pid=2026): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:12.383353 2014] [wsgi:error] [pid 2417:tid 140701601830656] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:10210] mod_wsgi 
> (pid=2417): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:12.514056 2014] [wsgi:error] [pid 2026:tid 140701568259840] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:10213] mod_wsgi 
> (pid=2026): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:13.106436 2014] [wsgi:error] [pid 2026:tid 140701677364992] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:14356] mod_wsgi 
> (pid=2026): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:13.367651 2014] [wsgi:error] [pid 3039:tid 140701668972288] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:10219] mod_wsgi 
> (pid=3039): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:17.423692 2014] [wsgi:error] [pid 2417:tid 140701610223360] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:14359] mod_wsgi 
> (pid=2417): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:17.450428 2014] [wsgi:error] [pid 3039:tid 140701815461632] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:10221] mod_wsgi 
> (pid=3039): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> [Mon Nov 03 11:20:20.733221 2014] [wsgi:info] [pid 3794:tid 140056943240960] 
> mod_wsgi (pid=3794): Maximum requests reached, triggering immediate shutdown 
> 'site-1'.
> [Mon Nov 03 11:20:20.733323 2014] [wsgi:info] [pid 3794:tid 140057235269504] 
> mod_wsgi (pid=3794): Shutdown requested 'site-1'.
> [Mon Nov 03 11:20:20.733511 2014] [wsgi:info] [pid 3794:tid 140057235269504] 
> mod_wsgi (pid=3794): Stopping process 'site-1'.
> [Mon Nov 03 11:20:20.733528 2014] [wsgi:info] [pid 3794:tid 140057235269504] 
> mod_wsgi (pid=3794): Destroying interpreters.
> [Mon Nov 03 11:20:20.733538 2014] [wsgi:info] [pid 3794:tid 140057235269504] 
> mod_wsgi (pid=3794): Cleanup interpreter ''.
> [Mon Nov 03 11:20:20.734308 2014] [wsgi:error] [pid 3794:tid 140057235269504] 
> 2014-11-03 11:20:20,734 (3794/Dummy-7) newrelic.core.agent INFO - New Relic 
> Python Agent Shutdown
> [Mon Nov 03 11:20:20.680893 2014] [wsgi:error] [pid 1712:tid 140701710935808] 
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:10226] mod_wsgi 
> (pid=1712): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.1237.0.4.sock'.
> 
> Please advise. Thanks.
> 
> 
> On Monday, November 3, 2014 6:43:19 PM UTC+8, Graham Dumpleton wrote:
> On 03/11/2014, at 9:35 PM, Kelvin Wong <[email protected]> wrote:
> 
> I don't think the original error situation is caused by server overloading as 
> the traffic of the server on that time it's fine.
> 
> I didn't say it was.
> 
> I just want to rule out some conditions as returning 503 can be caused by 
> overloading the service.
> Overloading the servers can produce 503 but the error log messages do not 
> match with the original error log message.
> 
> And I am only addressing the message:
> 
> 2014-11-03 07:47:01.591 [Mon Nov 03 07:46:54.109098 2014] [wsgi:error] [pid 
> 4615:tid 140372718073600] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:26793] mod_wsgi (pid=4615): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.3387.0.4.sock'.
> 
> That area of the code was rewritten in 4.X versions at some point and 
> introduced non blocking connects. Looking more at that specific message and 
> knowing the possibility that you were load testing I came across that 
> difference in Linux behaviour compared to what POSIX specifies.
> 
> So all I can do is address that specific point and that it is all I can do 
> until I can get those info level messages from the error logs so I can 
> understand whether the mod_wsgi daemon processes are actually restarting at 
> the same time.
> 
> Graham
> 
> 
> On Monday, November 3, 2014 5:58:47 PM UTC+8, Graham Dumpleton wrote:
> I still want any log messages either side of that point as it may confirm 
> what is going on, but what you might be encountering could be due to Linux 
> behaviour when performing a connect() on a UNIX socket in non blocking mode 
> not being POSIX compliant. Details:
> 
> http://lists.mysql.com/internals/35082
> 
> The mod_wsgi code is possibly not dealing with Linux's non POSIX compliance 
> and instead of interpreting it the same as connection refused (when the 
> listener socket backlog is full), it is regarding it as an outright failure.
> 
> So the cause of this if you are trying to do load testing is that you are 
> overwhelming the capacity of the mod_wsgi daemon processes causing the socket 
> listener queue to fill up. When it fills up then it is triggering the non 
> POSIX behaviour of Linux.
> 
> Can you confirm that you only saw this when you overwhelmed the server with 
> more traffic that you would realistically expect it to receive and way beyond 
> the capacity it could handle based on how it is configured?
> 
> Graham
> 
> On 03/11/2014, at 8:33 PM, Graham Dumpleton <[email protected]> wrote:
> 
> As I said before, I need to see messages either side of the event and with 
> Apache log level set to info.
> 
> Graham
> 
> On 03/11/2014, at 8:28 PM, Kelvin Wong <[email protected]> wrote:
> 
> I tried some load testing on the server to see if it is caused by overloading.
> 
> I found there will be some other log message when the server is overloading.
> 2014-11-03 07:47:01.591 [Mon Nov 03 07:46:53.795864 2014] [wsgi:error] [pid 
> 3724:tid 140372768429824] [client xx.xx.xx.xx:26561] Truncated or oversized 
> response headers received from daemon process 'site-1': 
> /home/ubuntu/site-1/apache/wsgi.py
> 2014-11-03 07:47:01.591 [Mon Nov 03 07:46:54.109098 2014] [wsgi:error] [pid 
> 4615:tid 140372718073600] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:26793] mod_wsgi (pid=4615): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.3387.0.4.sock'.
> 
> Then I concluded that the situation is not caused by overloading the server.
> 
> Do you have any directions for me to reproduce the situation?
> 
> Thanks.
> 
> 
> On Monday, November 3, 2014 11:09:33 AM UTC+8, Kelvin Wong wrote:
> I found no cron is running on root and user.
> 
> According to the logrotate config, the log rotate weekly.
> But the error situation is likely happened nightly.
> 
> So I think it will not be my concern.
> 
> 
> On Monday, November 3, 2014 11:00:10 AM UTC+8, Graham Dumpleton wrote:
> And when is 'cron' configured to run logrotate? Does it correspond to when 
> you are seeing the issue?
> 
> On 03/11/2014, at 1:56 PM, Kelvin Wong <[email protected]> wrote:
> 
> Yes, the memory usage is under control now and I increased maximum-requests 
> by 4 times.
> 
> For the error, 
> Timeout when reading response headers from daemon process 'site-1': 
> /home/ubuntu/site-1/apache/wsgi.py
> came out as sudden, that's why I capture this message only. I also found main 
> Apache error log nothing special.
> 
> Now, I turned the log level to 'info' in order to obtain more messages.
> 
> About the log file rotation, I only found the ubuntu default rotation one 
> which is using logrotate.
> /var/log/apache2/*.log {
>         weekly
>         missingok
>         rotate 52
>         compress
>         delaycompress
>         notifempty
>         create 640 root adm
>         sharedscripts
>         postrotate
>                 if /etc/init.d/apache2 status > /dev/null ; then \
>                     /etc/init.d/apache2 reload > /dev/null; \
>                 fi;
>         endscript
>         prerotate
>                 if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
>                         run-parts /etc/logrotate.d/httpd-prerotate; \
>                 fi; \
>         endscript
> }
> It will do Apache reload on postrotate. I research a little bit, the reload 
> is needed for Apache handling file.
> Some suggested "copytruncate" on the config to avoid reload. Will you also 
> suggest that?
> 
> 
> Thanks.
> 
> 
> On Monday, November 3, 2014 8:00:10 AM UTC+8, Graham Dumpleton wrote:
> 
> On 02/11/2014, at 2:18 AM, Kelvin Wong <[email protected]> wrote:
> 
> 
> I have some updates on the issue as it happened again. This time I captured 
> some data from New Relic.
> 
> From the Requests.png, I found that there is an spike aroung 22:23 and check 
> the Apache log. I found there are many as following.
> Timeout when reading response headers from daemon process 'site-1': 
> /home/ubuntu/site-1/apache/wsgi.py
> 
> Spike also found on the Workers.png during that time period. The graph is 
> showing most of the worker is doing write?
> After a few minute, the following log appear again, and start returning 503.
> (11)Resource temporarily unavailable: [client xx.xx.xx.xx:xxxxx] mod_wsgi 
> (pid=27816): Unable to connect to WSGI daemon process 'site-1' on 
> '/var/run/apache2/wsgi.17227.2.3.sock'.
> 
> Can you tell what happened from those graphs?
> 
> What I need to see is more of the log file around those events.
> 
> Ensure you have LogLevel set to 'info' in the Apache configuration as 
> mod_wsgi will then log more messages about why processes restarted.
> 
> Also ensure you are checking for message around the time in the global Apache 
> error log and not just in any virtual host specific error log, as some 
> messages, such as evidence of a restart or process crashes will only appear 
> in the main Apache error log.
> 
> With memory growth under control:
> 
> 
> 
> turning to look at daemon process restarts:
> 
> 
> 
> they are obviously more frequent. Zooming in, each process is restarting 
> about every 5 minutes, although varies over time.
> 
> I would suggest you can increase maximum-requests, possibly as high as 4 
> times what you are using.
> 
> When enabling 'info' logging that will reduce the amount of noise in the 
> error logs which may hide the specific anomalous event you are trying to 
> track down.
> 
> BTW. Did you look into whether any log file rotation is set up or not and 
> what did you find?
> 
> Depending on the system you might be able to find whether something is 
> running by looking at one cron jobs are running for root.
> 
>     sudo crontab -l
> 
> Graham
> 
> 
> Thanks.
> 
> 
> On Friday, October 31, 2014 5:08:27 PM UTC+8, Graham Dumpleton wrote:
> 
> On 31/10/2014, at 7:26 PM, Kelvin Wong <[email protected]> wrote:
> 
> Oh. I see. Thanks.
> 
> One more question.
> As I set the graceful-timeout on mod_wsgi process, then when Apache do 
> graceful restart, will mod_wsgi obey that timeout setting?
> 
> No. It is impossible. There is simply no way for the mod_wsgi daemon 
> processes to act in coordination with the Apache child worker processes on an 
> Apache graceful restart.
> 
> This is because the only way that mod_wsgi can create its distinct child 
> processes and have them managed by Apache at all is using a mechanism that in 
> the Apache runtime library is referred to as a 'other processes'.
> 
> It doesn't matter whether Apache as a whole is signalled for a normal 
> shutdown (SIGHUP) or a graceful restart (SIGUSR1), Apache will in turn always 
> send a SIGINT on to 'other processes' such as those created by mod_wsgi. If 
> those 'other processes' don't shutdown immediately it will signal them again 
> at one second intervals until 3 seconds have elapsed, at which point it will 
> send them a SIGKILL. So Apache only supports a brutal shutdown of that sort 
> for 'other processes'.
> 
> For its own child worker processes, to even support a graceful restart is a 
> non trivial thing. It isn't just a matter of sending the child worker 
> processes a SIGUSR1. Instead there is an elaborate scheme involving what is 
> called a pipe of death between the Apache parent and the child worker 
> processes, as well as shared state via shared memory between the Apache 
> parent process so the child worker processes now what sort of shutdown has 
> been triggered. That sort of mechanism to support a graceful shutdown for 
> 'other processes' simply does exist.
> 
> So it is how it is and there is no way to change things. In practice it 
> hasn't caused any issues and web requests being interrupted is always a risk 
> that a web application has to deal with anyway.
> 
> Graham
> 
> On Friday, October 31, 2014 4:16:55 PM UTC+8, Graham Dumpleton wrote:
> 
> 
> On 31/10/2014, at 6:53 PM, Kelvin Wong <[email protected]> wrote:
> 
> You means that the OS triggered Apache graceful restart? If so, why the OS do 
> that?
> 
> Yes and I just told you why, as part of a log rotation mechanism.
> 
> Try googling 'log file rotation' and see what mechanism is used on your 
> specific Linux system.
> 
> One link about one such solution which is often used is:
> 
> http://www.cyberciti.biz/faq/how-do-i-rotate-log-files/
> 
> Apache does itself have a way of handling log rotation itself, but likely 
> ignored by Linux distros which use their own preferred mechanism.
> 
> Graham
> 
> 
> On Friday, October 31, 2014 3:45:35 PM UTC+8, Graham Dumpleton wrote:
> Your operating system could be signalling Apache during the middle of the 
> night so that the log files can be rotated.
> 
> Graham
> 
> On 31/10/2014, at 6:36 PM, Kelvin Wong <[email protected]> wrote:
> 
> I checked the memory of overall Apache process, it goes up and down a little. 
> It Seemed the "maximum-requests" is doing its job.
> 
> I found that when I do "sudo service apache reload", Apache will do graceful 
> restart as it shown in main Apache error log.
> 
> 
> [Sun Oct 26 07:34:37.198806 2014] [wsgi:error] [pid 27816:tid 
> 140052910765824] (11)Resource temporarily unavailable: [client 
> xx.xx.xx.xx:xxxxx] mod_wsgi (pid=27816): Unable to connect to WSGI daemon 
> process 'site-1' on '/var/run/apache2/wsgi.17227.2.3.sock'.
> When before above situation happened, Apache also doing graceful restart.
> But this should not be triggered by anyone as I mentioned it happened during 
> sleep time.
> Is there any reasons that Apache do graceful restart by itself?
> 
> Thanks.
> 
> 
> On Friday, October 31, 2014 2:55:32 PM UTC+8, Graham Dumpleton wrote:
> BTW, in your NR agent configuration file find the 
> error_collector.ignore_errors setting and change it to:
> 
> error_collector.ignore_errors = django.http.request:UnreadablePostError
> 
> These indicate the HTTP client connection dropping before all content had 
> been sent.
> 
> Generally you want to ignore these and eliminate them from your error count 
> as they do not represent an error in your application but are the result of a 
> client problem.
> 
> Adding that configuration setting will cause them to be ignored and not 
> recorded as an error.
> 
> Graham
> 
> On 31/10/2014, at 5:41 PM, Graham Dumpleton <[email protected]> wrote:
> 
> 
> On 31/10/2014, at 5:04 PM, Kelvin Wong <[email protected]> wrote:
> 
> Thanks for the details!
> 
> I now have a better picture on how it works.
> I employed all the recommendations you suggested. I will keep monitoring the 
> processes.
> 
> But how about the graceful restart signaled by Apache? Why it happened?
> 
> As far as the message:
> 
> [Tue Oct 28 02:20:40.722140 2014] [wsgi:error] [pid 24158:tid 
> 140182690981632] (2)No such file or directory: [client 24.171.250.159:60769] 
> mod_wsgi (pid=24158): Unable to connect to WSGI daemon process 
> 'snaptee-production-api-ssl' on'/var/run/apache2/wsgi.30188.7.3.sock'.
> 
> goes, as I said I believe that could only be the result of an Apache graceful 
> restart.
> 
> My recollection is that a 'reload' under a Linux system using its process 
> management tools triggers an Apache graceful restart where as a 'restart' 
> does a normal Apache restart.
> 
> I believe you said you used 'reload' and so that
> ...
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/modwsgi.
> For more options, visit https://groups.google.com/d/optout.
> <error.log>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to