Thank you for the extremely quick turn-around and suggestions, Graham!
Per your advice, I removed `WSGIImportScript` and added
`WSGIRestrictEmbedded On`.
After some more research, setting up a dummy hello world Django app, I
confirmed the startup delay wasn't with mod_wsgi after all. It ended up
being a large number of static files getting scanned at startup by
Whitenoise (https://github.com/evansd/whitenoise/).
Thanks again for the quick response and apologies for the noise. I had
been spinning my wheels for longer than I care to admit and appreciate
your help ruling out mod_wsgi as the cause.
Best,
Jamie
On 10/22/20 6:40 PM, Graham Dumpleton wrote:
Remove:
WSGIImportScript /path/to/django-wsgi.py \
process-group=eslive application-group=%{GLOBAL}
Setting both process-group and application-group on WSGIScriptAlias
has the same effect of preloading the WSGI script file using
WSGIScriptAlias. I am not sure what will happen if both ways of
forcing preloading are set.
Also a memory corruption bug was also recently reported to me along
with a fix. This has been an outstanding issue for many years but
which so rarely occurred on full Linux and macOS platforms (Alpine
Linux would crash all the time though), that never been able to track
it down. This bug relates to the preloading of the WSGI script file,
so there is an outside chance it is related.
Disabling the preloading may not be desirable though because lazy
loading has greater risk of delaying first requests longer as can
queue up on process which is still loading the application. That said,
it may not be noticeable since only one thread per process. Thus worth
trying:
WSGIProcessGroup eslive
WSGIScriptAlias / /path/to/django-wsgi.py application-group=%{GLOBAL}
which because no WSGIImportScript, but both process-group and
application-group aren't said, means no pre-loading.
BTW, if you don't already have it said, ensure you are setting:
WSGIRestrictEmbedded On
if only using daemon mode. Not related, but good practice and cuts
down on memory usage and startup load on Apache child worker processes.
So first up try that. The bug fix I mention hasn't actually been
released yet as had some other unfinished stuff in code which wasn't
sure if I wanted to change. If you wanted to be brave though, you
could try the 'develop' branch of mod_wsgi on GitHub. If can replicate
in testing system, could perhaps try it there.
The only other thing can think of is if there is a cross process
conflict with initialisation done by your app in relation to a
database or backend service, when multiple processes are starting up
at the same time.
Finally, not sure whether might be adapted, but as very first thing in
WSGI script file you could start a background thread which watches for
an event set at end of WSGI script file import, and if takes more than
certain time to see that event, indicating slow WSGI script file load,
dump out Python stack traces. Code related to this is found at:
https://modwsgi.readthedocs.io/en/master/user-guides/debugging-techniques.html#extracting-python-stack-traces
It will need to be updated to Python 3 as probably still Python 2, and
then adapt it as mentioned.
Graham
On 23 Oct 2020, at 8:52 am, Jamie Biggar <[email protected]
<mailto:[email protected]>> wrote:
Hi all,
I've been a mod_wsgi user for many years (Graham, thank you for your
fantastic community support!), but this week ran into a mystery I
haven't been able to solve on my own.
We've been running a fairly hefty Django app in production with
mod_wsgi for years without much issue. In August, with no obviously
correlated change in code or server architecture, we started having
issues where a restart (usually triggered by `touch`ing the WSGI
script via `WSGIScriptReloading On`, though sometimes also by
`systemctl restart httpd.service`) would occasionally lead to an
unending stream of 504 timeouts (and sometimes some 503s as well)
lasting indefinitely. Another restart would sometimes fix it, but
not always. The issue seems to be load related -- the busier the
server is, the more likely it is to get stuck in the 504 loop. Most
restarts would work fine and yield a normally-running site after a
brief pause as the app was loaded into memory.
While troubleshooting today (not under production load), I noticed
something that I think is likely exacerbating load-related restart
timeout issues: it seems that after a flurry of activity on initial
server (re)start which clearly includes loading our WSGI script (as I
see entries in the Apache error log related to Python packages it
imports), there's a period of roughly 45 seconds when the CPU is idle
and no requests are served via mod_wsgi before it wakes up and
finally emits `Started thread 0 in daemon process ...` log messages,
then a few seconds later it's able to reply to HTTP requests.
*Any idea what could cause that ~45 second idle period during
startup?* I've tried tuning the *-timeout options for
WSGIDaemonProcess, with no apparent effect on the idle time. I also
tried disabling our NewRelic APM code to rule out a network API
bottleneck.
Software versions:
* Amazon Linux 2
* Python 3.6 (via IUS: https://ius.io/ )
* mod_wsgi/4.6.2 (also via IUS, compiled against Python 3.6)
* Apache/2.4.46
* Django 2.2
Apache config:
WSGIDaemonProcess eslive display-name='(wsgi:es-site)' \
processes=6 threads=1 \
user=apache group=apache \
python-home=/path/to/virtualenv \
python-path=/path/to/code/root \
python-eggs=/var/www/.python-eggs \
lang='en_US.UTF-8' locale='en_US.UTF-8' \
queue-timeout=45 \
socket-timeout=60 \
connect-timeout=15 \
request-timeout=120 \
startup-timeout=30 \
deadlock-timeout=60 \
eviction-timeout=0 \
shutdown-timeout=5 \
graceful-timeout=15 \
restart-interval=0 \
inactivity-timeout=0 \
maximum-requests=0
WSGIImportScript /path/to/django-wsgi.py \
process-group=eslive application-group=%{GLOBAL}
WSGISocketPrefix run/httpd-wsgi
<VirtualHost ...>
WSGIScriptAlias / /path/to/django-wsgi.py \
process-group=eslive application-group=%{GLOBAL}
WSGIPassAuthorization On
</VirtualHost>
Thanks in advance for any recommendations!
-Jamie
--
You received this message because you are subscribed to the Google
Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected]
<mailto:[email protected]>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/modwsgi/bcb386ac-7c83-459d-bced-792d535a09d0n%40googlegroups.com
<https://groups.google.com/d/msgid/modwsgi/bcb386ac-7c83-459d-bced-792d535a09d0n%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to a topic in the
Google Groups "modwsgi" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/modwsgi/EYQ6O5NLC3k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
[email protected]
<mailto:[email protected]>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/modwsgi/4D0C89F5-4F66-478A-B61D-049C3C8622AD%40gmail.com
<https://groups.google.com/d/msgid/modwsgi/4D0C89F5-4F66-478A-B61D-049C3C8622AD%40gmail.com?utm_medium=email&utm_source=footer>.
--
*Jamie Biggar*
VP Engineering & CTO, EnergySage <https://www.energysage.com/>
617.396.7215 | [email protected] <mailto:[email protected]>
Get an _instant estimate_
<https://www.energysage.com/solar/calculator/> to see your solar savings!
--
You received this message because you are subscribed to the Google Groups
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/modwsgi/43736233-355c-9205-71d1-0286aad37d34%40energysage.com.