Fixing my bad edit at the end so makes proper sense: Reason am pointing at that is that if there is only one URL of your application which is writing these files, then you could consider delegating just that one URL to be handled under mod_wsgi embedded mode, rather than in the daemon mode process with the rest of your application code. As long as the request handler for that doesn't drag in too much code, and aren't using prefork MPM, the memory cost in Apache child processes may be manageable. By having that one URL be handled in daemon mode, then the processes it runs in will be handled under the graceful restart mode of the main Apache child processes.
> On 8 May 2022, at 9:27 am, Graham Dumpleton <[email protected]> > wrote: > > It definitely is an annoying problem. To be honest I don't think I have ever > really considered writing my own sub process manager instead of using the > Apache other processes management code. I will need to think about why I > never considered doing that and how complicated would be to replicate. > > As to an interim solution, have a read of: > > http://blog.dscpl.com.au/2014/02/vertically-partitioning-python-web.html > <http://blog.dscpl.com.au/2014/02/vertically-partitioning-python-web.html> > > Reason am pointing at that is that if there is only one URL of your > application which is writing these files, then you could consider delegating > just that one URL to be handled under mod_wsgi embedded mode, rather than in > the daemon mode process with the rest of your application code and aren't > using preform MPM. As long as the request handler for that doesn't drag in > too much code, the memory cost in Apache child processes may be manageable. > By having that one URL be handled in daemon mode, then the processes it runs > in will be handled under the graceful restart mode of the main Apache child > processes. > > Graham > >> On 8 May 2022, at 9:16 am, Tomi Belan <[email protected] >> <mailto:[email protected]>> wrote: >> >> How much work would it take to have true graceful restarts for the mod_wsgi >> daemon processes? >> >> Current behavior: >> When "apache2ctl graceful" aka "httpd -k graceful" runs, the Apache parent >> process sends a SIGTERM to each mod_wsgi daemon process, waits up to 3 >> seconds (hardcoded maximum), and sends a SIGKILL to any that are still >> alive. After they're all dead, it spawns new wsgi processes. This is >> mentioned in various issues like #383 >> <https://github.com/GrahamDumpleton/mod_wsgi/issues/383> and #124 >> <https://github.com/GrahamDumpleton/mod_wsgi/issues/124>, and in the >> documentation of WSGIDaemonProcess shutdown-timeout >> <https://modwsgi.readthedocs.io/en/master/configuration-directives/WSGIDaemonProcess.html#:~:text=shutdown%2Dtimeout>. >> In contrast, Apache sends SIGUSR1 to its own worker processes, and whenever >> one of them exits, Apache spawns a new one. So there should almost always be >> enough processes ready to serve new connections. >> (https://httpd.apache.org/docs/2.4/stopping.html#graceful >> <https://httpd.apache.org/docs/2.4/stopping.html#graceful>) >> >> My wishlist for "true" graceful restarts would be: >> 1. Make the shutdown timeout configurable. >> 2. Don't wait until *all* old daemon processes exit. Either spawn 1 new >> process whenever 1 old process exits, or spawn all N new processes >> immediately and let the old processes exit when they want. >> 3. Add another signal between the SIGTERM and SIGKILL which throws a Python >> exception, so that "finally:" blocks have a chance to run. >> >> Current code: >> The linked github issues did mention that this behavior is hardcoded deep in >> Apache and there is nothing mod_wsgi can do, but I wanted to see it myself. >> Actually, the logic is not anywhere in https://github.com/apache/httpd >> <https://github.com/apache/httpd> (in particular, it's NOT server/mpm_unix.c >> <https://github.com/apache/httpd/blob/trunk/server/mpm_unix.c>), but in >> https://github.com/apache/apr <https://github.com/apache/apr>. Specifically >> the SIGKILL is sent at apr/memory/unix/apr_pools.c#L2810 >> <https://github.com/apache/apr/blob/39c271bca156adee03ff49f864dcce27ae6f5d73/memory/unix/apr_pools.c#L2810> >> and the 3 seconds timeout is hardcoded at apr/memory/unix/apr_pools.c#L98 >> <https://github.com/apache/apr/blob/39c271bca156adee03ff49f864dcce27ae6f5d73/memory/unix/apr_pools.c#L98>. >> Any subprocess registered with apr_pool_note_subprocess(..., >> APR_KILL_AFTER_TIMEOUT) will use that timeout. mod_wsgi calls that function >> at server/mod_wsgi.c#L10566 >> <https://github.com/GrahamDumpleton/mod_wsgi/blob/dabb377a29cba190c6c48659e3f81df685e47aad/src/server/mod_wsgi.c#L10566>. >> The pool where the subprocesses are registered is the pconf pool given to >> wsgi_hook_init. I guess they are probably killed when Apache calls >> apr_pool_clear(process->pconf) in reset_process_pconf() in main.c, but I >> haven't verified this. >> The normal worker process logic is implemented in each mpm. E.g. prefork >> replaces dead children with new live children at >> server/mpm/prefork/prefork.c#L1145 >> <https://github.com/apache/httpd/blob/6596870481dc1f0e28ac59c52455691fee9c8524/server/mpm/prefork/prefork.c#L1145>, >> I think. >> >> My thoughts: (please correct me if I'm wrong) >> This seems pretty hard. I definitely see why it wasn't done yet. And maybe >> it's not worth the complexity even if it is possible. >> Originally I hoped I could just write an Apache patch to replace the >> hardcoded timeout value with a config file option. But the logic is in a >> library (apr) so I can't read Apache config directly, and there might be >> API/ABI concerns with extending apr_pool_note_subprocess(). And anyway, >> *only* making the timeout configurable wouldn't be enough because the server >> would just wait without any mod_wsgi process accepting new connections. >> I think the best chance of success would be to stop using apr_pool_t and >> apr_pool_note_subprocess() for process management in mod_wsgi. After all, >> it's not the only way: Either use fork() etc directly, like the mpm modules, >> or at least, keep apr_pool_t but use our own custom pool rather than "pconf" >> - most likely saved with ap_retained_data_get(). That way mod_wsgi would >> have more control. When it learns the server is gracefully restarting, it >> will spawn new daemon processes immediately with a new socket name, and >> timeout/kill the old processes later in the background. When it learns the >> server is stopping, it will block until the children are terminated. >> >> Does this make sense? Are there any glaring issues I've overlooked? >> >> If the strategy sounds sensible, and if I have enough time, I might try to >> code this. Is it something you would be potentially interested in merging? >> (not too much code review burden, maintenance burden, or risk of new bugs) >> >> Just for completeness, the backstory of why I want this: >> My Python app writes files to disk. Sadly, some requests take more than 3 >> seconds. If it is killed with SIGKILL, the file buffer data is not written, >> resulting in a corrupted empty/truncated file. A later batch process fails >> when it tries to read every file in the output directory. I know there are >> many workarounds, such as using a temporary file and atomically renaming it, >> but I became curious about the root cause. >> The server gracefully restarts every day because of log rotation, using >> Ubuntu's default logrotate config. After reading #383 >> <https://github.com/GrahamDumpleton/mod_wsgi/issues/383> I also looked at >> Apache's rotatelogs >> <https://httpd.apache.org/docs/2.4/programs/rotatelogs.html>, but it doesn't >> support compression, so I'd rather stay with logrotate. >> >> Versions: Apache 2.4.41 with mpm_prefork, mod_wsgi 4.6.8 in daemon mode, >> Python 3.8.10, Ubuntu 20.04. (old but I don't think this matters) >> >> Tomi >> >> -- >> You received this message because you are subscribed to the Google Groups >> "modwsgi" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] >> <mailto:[email protected]>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/modwsgi/CACUV5oemMwr1YzKe%3D0JrBTma%2BwQcvyaN5Jzc5uz_Kf31mK12ng%40mail.gmail.com >> >> <https://groups.google.com/d/msgid/modwsgi/CACUV5oemMwr1YzKe%3D0JrBTma%2BwQcvyaN5Jzc5uz_Kf31mK12ng%40mail.gmail.com?utm_medium=email&utm_source=footer>. > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/CBE9D12E-44EC-4B0D-856F-8DC790213E14%40gmail.com.
