Hi all, I've been a mod_wsgi user for many years (Graham, thank you for your fantastic community support!), but this week ran into a mystery I haven't been able to solve on my own.
We've been running a fairly hefty Django app in production with mod_wsgi for years without much issue. In August, with no obviously correlated change in code or server architecture, we started having issues where a restart (usually triggered by `touch`ing the WSGI script via `WSGIScriptReloading On`, though sometimes also by `systemctl restart httpd.service`) would occasionally lead to an unending stream of 504 timeouts (and sometimes some 503s as well) lasting indefinitely. Another restart would sometimes fix it, but not always. The issue seems to be load related -- the busier the server is, the more likely it is to get stuck in the 504 loop. Most restarts would work fine and yield a normally-running site after a brief pause as the app was loaded into memory. While troubleshooting today (not under production load), I noticed something that I think is likely exacerbating load-related restart timeout issues: it seems that after a flurry of activity on initial server (re)start which clearly includes loading our WSGI script (as I see entries in the Apache error log related to Python packages it imports), there's a period of roughly 45 seconds when the CPU is idle and no requests are served via mod_wsgi before it wakes up and finally emits `Started thread 0 in daemon process ...` log messages, then a few seconds later it's able to reply to HTTP requests. *Any idea what could cause that ~45 second idle period during startup?* I've tried tuning the *-timeout options for WSGIDaemonProcess, with no apparent effect on the idle time. I also tried disabling our NewRelic APM code to rule out a network API bottleneck. Software versions: * Amazon Linux 2 * Python 3.6 (via IUS: https://ius.io/ ) * mod_wsgi/4.6.2 (also via IUS, compiled against Python 3.6) * Apache/2.4.46 * Django 2.2 Apache config: WSGIDaemonProcess eslive display-name='(wsgi:es-site)' \ processes=6 threads=1 \ user=apache group=apache \ python-home=/path/to/virtualenv \ python-path=/path/to/code/root \ python-eggs=/var/www/.python-eggs \ lang='en_US.UTF-8' locale='en_US.UTF-8' \ queue-timeout=45 \ socket-timeout=60 \ connect-timeout=15 \ request-timeout=120 \ startup-timeout=30 \ deadlock-timeout=60 \ eviction-timeout=0 \ shutdown-timeout=5 \ graceful-timeout=15 \ restart-interval=0 \ inactivity-timeout=0 \ maximum-requests=0 WSGIImportScript /path/to/django-wsgi.py \ process-group=eslive application-group=%{GLOBAL} WSGISocketPrefix run/httpd-wsgi <VirtualHost ...> WSGIScriptAlias / /path/to/django-wsgi.py \ process-group=eslive application-group=%{GLOBAL} WSGIPassAuthorization On </VirtualHost> Thanks in advance for any recommendations! -Jamie -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/bcb386ac-7c83-459d-bced-792d535a09d0n%40googlegroups.com.
