Hi all,

I've been a mod_wsgi user for many years (Graham, thank you for your 
fantastic community support!), but this week ran into a mystery I haven't 
been able to solve on my own.

We've been running a fairly hefty Django app in production with mod_wsgi 
for years without much issue.  In August, with no obviously correlated 
change in code or server architecture, we started having issues where a 
restart (usually triggered by `touch`ing the WSGI script via 
`WSGIScriptReloading On`, though sometimes also by `systemctl restart 
httpd.service`) would occasionally lead to an unending stream of 504 
timeouts (and sometimes some 503s as well) lasting indefinitely.  Another 
restart would sometimes fix it, but not always.  The issue seems to be load 
related -- the busier the server is, the more likely it is to get stuck in 
the 504 loop.  Most restarts would work fine and yield a normally-running 
site after a brief pause as the app was loaded into memory.

While troubleshooting today (not under production load), I noticed 
something that I think is likely exacerbating load-related restart timeout 
issues:  it seems that after a flurry of activity on initial server 
(re)start which clearly includes loading our WSGI script (as I see entries 
in the Apache error log related to Python packages it imports), there's a 
period of roughly 45 seconds when the CPU is idle and no requests are 
served via mod_wsgi before it wakes up and finally emits  `Started thread 0 
in daemon process ...` log messages, then a few seconds later it's able to 
reply to HTTP requests.

*Any idea what could cause that ~45 second idle period during startup?*  
I've tried tuning the *-timeout options for WSGIDaemonProcess, with no 
apparent effect on the idle time.  I also tried disabling our NewRelic APM 
code to rule out a network API bottleneck.

Software versions:

* Amazon Linux 2
* Python 3.6 (via IUS: https://ius.io/ )
* mod_wsgi/4.6.2 (also via IUS, compiled against Python 3.6)
* Apache/2.4.46
* Django 2.2

Apache config:
WSGIDaemonProcess eslive display-name='(wsgi:es-site)' \
  processes=6 threads=1 \
  user=apache group=apache \
  python-home=/path/to/virtualenv \
  python-path=/path/to/code/root \
  python-eggs=/var/www/.python-eggs \
  lang='en_US.UTF-8' locale='en_US.UTF-8' \
  queue-timeout=45 \
  socket-timeout=60 \
  connect-timeout=15 \
  request-timeout=120 \
  startup-timeout=30 \
  deadlock-timeout=60 \
  eviction-timeout=0 \
  shutdown-timeout=5 \
  graceful-timeout=15 \
  restart-interval=0 \
  inactivity-timeout=0 \
  maximum-requests=0
WSGIImportScript /path/to/django-wsgi.py \
  process-group=eslive application-group=%{GLOBAL}
WSGISocketPrefix run/httpd-wsgi
<VirtualHost ...>
WSGIScriptAlias / /path/to/django-wsgi.py \
  process-group=eslive application-group=%{GLOBAL}
 WSGIPassAuthorization On
</VirtualHost>

Thanks in advance for any recommendations!

-Jamie

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/modwsgi/bcb386ac-7c83-459d-bced-792d535a09d0n%40googlegroups.com.

Reply via email to