Re: [modwsgi] Errors when gracefully restarting more than 1 daemon process at a time

Graham Dumpleton Mon, 26 Aug 2019 23:07:13 -0700


> On 27 Aug 2019, at 2:30 pm, Jay <[email protected]> wrote:
> 
> Hi Graham,
> 
> First off, thanks for all your work on mod_wsgi and the docker images! It's 
> been a tremendous help.


All the docker images I created are quite out of date and not actively 
maintained as there wasn't enough interest in them to justify the effort on 
them. Which are you using and how old is the mod_wsgi version?

> I'm running an API server using Django, and for memory reasons, I'd like to 
> gracefully restart the daemon processes periodically. What I've found is that 
> using `restart-interval` or sending `SIGUSR1` to multiple daemon processes at 
> the same time causes my app (which is behind a load balancer) to return 502s 
> (bad gateway) to the consumer. This doesn't seem to happen when I send 
> `SIGUSR1` to the daemon processes 1 at a time, though.
> 
> This is what I'm using in my `server_args`:
> --server-mpm event
> --processes 4
> --threads 16
> --application-type module
> --url-alias /static static
> --compress-responses
> --log-level info
> --startup-log
> --keep-alive-timeout 5
> --server-status
> --request-timeout 120
> --shutdown-timeout 120
> app.wsgi
> 
> 
> Is this behavior expected? One guess is that there is a race condition where 
> if multiple daemon processes undergo the shutdown sequence at nearly the same 
> time, some requests can get routed to hit a daemon process that has already 
> stopped accepting new requests.

That shouldn't occur. The code uses what is called a cross process mutex. A 
daemon process will only acquire that mutex lock when it is in a running state, 
and has capacity to handle requests. If multiple daemon process were restarted 
at the same time, all that should happen is that requests will be queued up in 
the socket listener queue between Apache child processes and daemon processes, 
until a daemon process is ready to start accepting requests again. The queue 
depth is usually 100, which is more than the whole Apache capacity anyway, so 
shouldn't even be able to fill that and start having errors.

Further, there are some timeouts in play which means that it tries to only 
restart a daemon process when there are no active requests being handled by 
that process.

    optparse.make_option('--graceful-timeout', type='int', default=15,
            metavar='SECONDS', help='Grace period for requests to complete '
            'normally, while still accepting new requests, when worker '
            'processes are being shutdown and restarted due to maximum '
            'requests being reached or restart interval having expired. '
            'Defaults to 15 seconds.'),
    optparse.make_option('--eviction-timeout', type='int', default=0,
            metavar='SECONDS', help='Grace period for requests to complete '
            'normally, while still accepting new requests, when the WSGI '
            'application is being evicted from the worker processes, and '
            'the process restarted, due to forced graceful restart signal. '
            'Defaults to timeout specified by \'--graceful-timeout\' '
            'option.'),

The eviction timeout should come into play here, and because it is 15 seconds, 
would be surprised if the process would keep in lock step and really end up 
restarting at the exact same time. They should naturally drift apart, unless 
you have insignificant traffic, in which case also can't see how would get an 
error, as requests should queue up still.

Only things I can think of are that you are sending the SIGUSR1 to the Apache 
parent process as well, and not just the mod_wsgi daemon processes. Or that you 
are actually managing to restart the whole container somehow.

What do the logs show around the time when you send the signals. You are using 
INFO level logging for Apache, so it should show lots of mod_wsgi log messages 
about what is happening with the restarting daemon mode processes.

> Is there an out-of-the-box solution to handle this? Or is the workaround to 
> run a job that sends the graceful restart signal to the daemon processes 1 at 
> a time?
> 
> Thanks!
> 
> Best,
> Jay
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/modwsgi/59455b12-a0ea-414a-9ad3-2e1f6740bac2%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/modwsgi/59455b12-a0ea-414a-9ad3-2e1f6740bac2%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/modwsgi/60A26177-F169-4BBF-91D9-7B8A97495FDA%40gmail.com.

Re: [modwsgi] Errors when gracefully restarting more than 1 daemon process at a time

Reply via email to