Re: [modwsgi] True graceful restarts for mod_wsgi daemon mode

Tomi Belan Sat, 07 May 2022 20:01:42 -0700

I didn't expect such a fast answer! Thank you!

I'm definitely interested if you have any other thoughts about writing a 
custom process manager. Especially any potential issues or edge cases that 
must be taken care of.


I will probably try my hand at it just for fun, but I'm not at all familiar 
with Apache and mod_wsgi internals, so it's pretty daunting. It probably 
won't go anywhere.

Looking at the code, MPM modules do have some superpowers, such as access 
to struct ap_unixd_mpm_retained_data. Normal modules will have a harder 
time distinguishing between a graceful restart and full shutdown. Maybe by 
registering one cleanup function on pconf and another one on ap_pglobal...? 
Who knows.

As for my app:
Partitioning by URL is an interesting idea. Sadly it won't work for my app, 
because almost every request can write these files, and the URL doesn't 
reveal which requests may be slow. Plus we're forced to use prefork because 
we need a certain ancient single-sign-on module which is not thread safe. 
Plus we probably can't use embedded mode anyway, because the server runs 
two wsgi apps with different virtualenvs, and it needs 
"WSGIApplicationGroup %{GLOBAL}" for the lxml library. As I understand it, 
embedded mode can't do that. Currently they are two daemon process-groups.
If I'm being honest with myself, the most pragmatic solution might be to 
switch to Gunicorn. ;) But even if it comes to that, this puzzle still 
interests me. It would be neat to find a proper solution, whether I 
ultimately use it in production or not.

On Sunday, May 8, 2022 at 1:31:33 AM UTC+2 Graham Dumpleton wrote:

> Fixing my bad edit at the end so makes proper sense:
>
> Reason am pointing at that is that if there is only one URL of your 
> application which is writing these files, then you could consider 
> delegating just that one URL to be handled under mod_wsgi embedded mode, 
> rather than in the daemon mode process with the rest of your application 
> code. As long as the request handler for that doesn't drag in too much 
> code, and aren't using prefork MPM, the memory cost in Apache child 
> processes may be manageable. By having that one URL be handled in daemon 
> mode, then the processes it runs in will be handled under the graceful 
> restart mode of the main Apache child processes.
>
> On 8 May 2022, at 9:27 am, Graham Dumpleton <[email protected]> wrote:
>
> It definitely is an annoying problem. To be honest I don't think I have 
> ever really considered writing my own sub process manager instead of using 
> the Apache other processes management code. I will need to think about why 
> I never considered doing that and how complicated would be to replicate.
>
> As to an interim solution, have a read of:
>
> http://blog.dscpl.com.au/2014/02/vertically-partitioning-python-web.html
>
> Reason am pointing at that is that if there is only one URL of your 
> application which is writing these files, then you could consider 
> delegating just that one URL to be handled under mod_wsgi embedded mode, 
> rather than in the daemon mode process with the rest of your application 
> code and aren't using preform MPM. As long as the request handler for that 
> doesn't drag in too much code, the memory cost in Apache child processes 
> may be manageable. By having that one URL be handled in daemon mode, then 
> the processes it runs in will be handled under the graceful restart mode of 
> the main Apache child processes.
>
> Graham
>
> On 8 May 2022, at 9:16 am, Tomi Belan <[email protected]> wrote:
>
> How much work would it take to have true graceful restarts for the 
> mod_wsgi daemon processes?
>
> Current behavior:
> When "apache2ctl graceful" aka "httpd -k graceful" runs, the Apache parent 
> process sends a SIGTERM to each mod_wsgi daemon process, waits up to 3 
> seconds (hardcoded maximum), and sends a SIGKILL to any that are still 
> alive. After they're all dead, it spawns new wsgi processes. This is 
> mentioned in various issues like #383 
> <https://github.com/GrahamDumpleton/mod_wsgi/issues/383> and #124 
> <https://github.com/GrahamDumpleton/mod_wsgi/issues/124>, and in the 
> documentation of WSGIDaemonProcess shutdown-timeout 
> <https://modwsgi.readthedocs.io/en/master/configuration-directives/WSGIDaemonProcess.html#:~:text=shutdown%2Dtimeout>
> .
> In contrast, Apache sends SIGUSR1 to its own worker processes, and 
> whenever one of them exits, Apache spawns a new one. So there should almost 
> always be enough processes ready to serve new connections. (
> https://httpd.apache.org/docs/2.4/stopping.html#graceful)
>
> My wishlist for "true" graceful restarts would be:
> 1. Make the shutdown timeout configurable.
> 2. Don't wait until *all* old daemon processes exit. Either spawn 1 new 
> process whenever 1 old process exits, or spawn all N new processes 
> immediately and let the old processes exit when they want.
> 3. Add another signal between the SIGTERM and SIGKILL which throws a 
> Python exception, so that "finally:" blocks have a chance to run.
>
> Current code:
> The linked github issues did mention that this behavior is hardcoded deep 
> in Apache and there is nothing mod_wsgi can do, but I wanted to see it 
> myself.
> Actually, the logic is not anywhere in https://github.com/apache/httpd 
> (in particular, it's NOT server/mpm_unix.c 
> <https://github.com/apache/httpd/blob/trunk/server/mpm_unix.c>), but in 
> https://github.com/apache/apr. Specifically the SIGKILL is sent at 
> apr/memory/unix/apr_pools.c#L2810 
> <https://github.com/apache/apr/blob/39c271bca156adee03ff49f864dcce27ae6f5d73/memory/unix/apr_pools.c#L2810>
>  and 
> the 3 seconds timeout is hardcoded at apr/memory/unix/apr_pools.c#L98 
> <https://github.com/apache/apr/blob/39c271bca156adee03ff49f864dcce27ae6f5d73/memory/unix/apr_pools.c#L98>.
>  
> Any subprocess registered with apr_pool_note_subprocess(..., 
> APR_KILL_AFTER_TIMEOUT) will use that timeout. mod_wsgi calls that function 
> at server/mod_wsgi.c#L10566 
> <https://github.com/GrahamDumpleton/mod_wsgi/blob/dabb377a29cba190c6c48659e3f81df685e47aad/src/server/mod_wsgi.c#L10566>
> .
> The pool where the subprocesses are registered is the pconf pool given to 
> wsgi_hook_init. I guess they are probably killed when Apache 
> calls apr_pool_clear(process->pconf) in reset_process_pconf() in main.c, 
> but I haven't verified this.
> The normal worker process logic is implemented in each mpm. E.g. prefork 
> replaces dead children with new live children at 
> server/mpm/prefork/prefork.c#L1145 
> <https://github.com/apache/httpd/blob/6596870481dc1f0e28ac59c52455691fee9c8524/server/mpm/prefork/prefork.c#L1145>,
>  
> I think.
>
> My thoughts: (please correct me if I'm wrong)
> This seems pretty hard. I definitely see why it wasn't done yet. And maybe 
> it's not worth the complexity even if it is possible.
> Originally I hoped I could just write an Apache patch to replace the 
> hardcoded timeout value with a config file option. But the logic is in a 
> library (apr) so I can't read Apache config directly, and there might be 
> API/ABI concerns with extending apr_pool_note_subprocess(). And anyway, 
> *only* making the timeout configurable wouldn't be enough because the 
> server would just wait without any mod_wsgi process accepting new 
> connections.
> I think the best chance of success would be to stop using apr_pool_t and 
> apr_pool_note_subprocess() for process management in mod_wsgi. After all, 
> it's not the only way: Either use fork() etc directly, like the mpm 
> modules, or at least, keep apr_pool_t but use our own custom pool rather 
> than "pconf" - most likely saved with ap_retained_data_get(). That way 
> mod_wsgi would have more control. When it learns the server is gracefully 
> restarting, it will spawn new daemon processes immediately with a new 
> socket name, and timeout/kill the old processes later in the background. 
> When it learns the server is stopping, it will block until the children are 
> terminated.
>
> Does this make sense? Are there any glaring issues I've overlooked?
>
> If the strategy sounds sensible, and if I have enough time, I might try to 
> code this. Is it something you would be potentially interested in merging? 
> (not too much code review burden, maintenance burden, or risk of new bugs)
>
> Just for completeness, the backstory of why I want this:
> My Python app writes files to disk. Sadly, some requests take more than 3 
> seconds. If it is killed with SIGKILL, the file buffer data is not written, 
> resulting in a corrupted empty/truncated file. A later batch process fails 
> when it tries to read every file in the output directory. I know there are 
> many workarounds, such as using a temporary file and atomically renaming 
> it, but I became curious about the root cause.
> The server gracefully restarts every day because of log rotation, using 
> Ubuntu's default logrotate config. After reading #383 
> <https://github.com/GrahamDumpleton/mod_wsgi/issues/383> I also looked at 
> Apache's rotatelogs 
> <https://httpd.apache.org/docs/2.4/programs/rotatelogs.html>, but it 
> doesn't support compression, so I'd rather stay with logrotate.
>
> Versions: Apache 2.4.41 with mpm_prefork, mod_wsgi 4.6.8 in daemon mode, 
> Python 3.8.10, Ubuntu 20.04. (old but I don't think this matters)
>
> Tomi
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/modwsgi/CACUV5oemMwr1YzKe%3D0JrBTma%2BwQcvyaN5Jzc5uz_Kf31mK12ng%40mail.gmail.com
>  
> <https://groups.google.com/d/msgid/modwsgi/CACUV5oemMwr1YzKe%3D0JrBTma%2BwQcvyaN5Jzc5uz_Kf31mK12ng%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/modwsgi/6f3de9e7-d045-4b15-b771-956915c0ec32n%40googlegroups.com.

Re: [modwsgi] True graceful restarts for mod_wsgi daemon mode

Reply via email to