Thanks for the vhost suggestion. I hadn't thought of that but it turned out we didn't need to.
After more testing it turns out that a .wsgi script with simple hello world script per https://modwsgi.readthedocs.io/en/develop/user-guides/quick-configuration-guide.html#wsgi-application-script-file is exhibiting the problem: with no incoming requests at all the mod_wsgi process after sitting there for anywhere from a few minutes to a few hours, dies with a segmentation fault. Any idea what else I could look at? On Wednesday, October 26, 2022 at 12:10:38 AM UTC-6 Graham Dumpleton wrote: > The LogLevel can be set just in a VirtualHost context if is under separate > host. If you are then using separate log files for differential VirtualHost > it should at least be semi segmented from everything else. > > On 26 Oct 2022, at 4:34 pm, stuart mcgraw <smcg...@gmail.com> wrote: > > > > I was grepping all the log files for any messages within a minute or two > before the segfaults so if a request was logged anywhere I should have seen > it. > > I'll mention the LogLevel Debug setting to them, but there were complaints > before that LogLevel Info was too noisy so I'm not sure that will fly. > > I'll look into the possibility of errant app threads and post back if > anything turns up. Thanks very much for your help with this. > On Tuesday, October 25, 2022 at 11:05:28 PM UTC-6 Graham Dumpleton wrote: > >> The only way I can think of that you may get a request which wasn't >> logged, is if an internal Apache request was triggered via an internal >> redirect from another Apache module. There still has to be an original >> request, but it would be logged as different request URL to where it got >> internally redirected. >> >> Since you are using mod_wsgi daemon mode you can likely see better >> evidence of all requests being handled if turn on verbose debugging mode, >> but would be quite noisy. >> >> LogLevel debug >> WSGIVerboseDebugging On >> >> Graham >> >> On 26 Oct 2022, at 3:33 pm, stuart mcgraw <smcg...@gmail.com> wrote: >> >> I didn't compile mod_wsgi myself so I can't say 100% but the person who >> did said so and there is a source directory on the machine named >> mod_wsgi-4.9.4 with a mod_wsgi.so file whose sha1 checksum matches that of >> the file in the apache modules/ directory, so I'd say I'm 99.9% sure. >> >> But the chances of >1Gi requests being made seem pretty small. The urls >> haven't been publicized, there are only a handful of known users accessing >> the urls infrequently as testers and nothing in the application would >> generate requests of that magnitude. >> >> This may be out of scope for you, but are you aware of any (reasonably >> normal) circumstances under which a mod_wsgi process could receive a >> request that wasn't logged by Apache? Or perhaps I could modify the >> mod_wsgi source code to print a message to a file when a request was >> received (which I could then correlate with the Apache logs to answer the >> question.) Because usage is very light and this is only for short term >> debugging, I don't think locking or anything fancy would be needed? >> >> And I am still wondering about library mismatches or conflicts since >> Apache, Python, mod_wsgi and C-based Python modules (eg psyocopg2) used by >> the app were all built from source. It is possible that some version >> mismatch there causes some memory corruption that is later manifest when >> one of the mod_wsgi housekeeping threads runs? I would like if possible to >> rule this out or at least put at the bottom of the list. >> On Tuesday, October 25, 2022 at 8:35:07 PM UTC-6 Graham Dumpleton wrote: >> >>> I know you said you were using mod_wsgi/4.9.4, but are you absolutely >>> sure? Apache/2.4.54 made a breaking change by changing the default for >>> LimitRequestBody directive, which would cause mod_wsgi daemon process to >>> crash when there were sent large request bodies over 1Gi. This was fixed in >>> version 4.9.4, but am wondering whether your production system has older >>> version than your development systems use and you just aren't aware of that. >>> >>> >>> https://modwsgi.readthedocs.io/en/master/release-notes/version-4.9.4.html#bugs-fixed >>> >>> As to back ground threads, mod_wsgi has a couple of background threads >>> which check for idle activity, deadlocks and things, but they touch so >>> little they have never caused issues in the past. Beyond that, the request >>> handler threads themselves should be stuck on a select loop if no requests >>> are happening. >>> >>> On 26 Oct 2022, at 1:12 pm, stuart mcgraw <smcg...@gmail.com> wrote: >>> >>> Again, thanks for those suggestions. >>> >>> The OOM killer seems not to be an issue. I've been told there are no >>> signs of it in the system logs and no signs of memory problems via >>> monitoring during nomal operations. >>> >>> Nor did "WSGIDestroyInterpreter Off" have any effect, the segfaults are >>> still occurring after that was added and Apache restarted. >>> >>> My understanding of how mod_wsgi works is pretty sketchy. IIUC you are >>> saying that the mod_wsgi processes are sitting there, waiting on a select() >>> call or the like, to receive a request from the mod_wsgi code within >>> Apache; and in that state they cannot simply spontaneously crash -- it must >>> be that either that the process received request from Apache (via the >>> mod_wsgi module) or there is some independent thread running in the Python >>> part of the mod_wsgi process (which is running my wsgi app) that is causing >>> the crash? >>> >>> I based my claim that there were no requests coincidental with the >>> segfaults based on the lack of log messages within a second or two for some >>> of the segfaults. (Its a moderately busy server so of course there were >>> also some close in time but for seemingly unrelated pages: eg, python, php >>> or c cgi, or html.) Is it possible that the mod_wsgi processes are getting >>> woken up by something that does not produce an apache access log entry? >>> >>> I'm still working on the python thread hypothesis (this is a production >>> server so changes aren't easy.) >>> On Sunday, October 23, 2022 at 2:12:02 PM UTC-6 Graham Dumpleton wrote: >>> >>>> How much memory do the processes use? Maybe the system OOM process >>>> killer is killing the processes as they consume lots of memory and the >>>> system thinks it is running low. There were some potential problems >>>> introduced with Python 3.9 with how process are shutdown and that causes >>>> embedded systems to fail on shutdown. >>>> >>>> See: >>>> >>>> >>>> https://modwsgi.readthedocs.io/en/master/release-notes/version-4.9.1.html#features-changed >>>> >>>> You can try setting: >>>> >>>> WSGIDestroyInterpreter Off >>>> >>>> as mentioned in those change notes and see if it goes away. >>>> >>>> Other than that, if you are confident that no new requests are >>>> arriving, can only suggest you work out if there are background threads >>>> running in Python. >>>> >>>> You can do that be adding code as described in: >>>> >>>> >>>> https://modwsgi.readthedocs.io/en/master/user-guides/debugging-techniques.html#extracting-python-stack-traces >>>> >>>> and triggering a dump of running threads by touching a file in the file >>>> system. >>>> >>>> It might also be helpful if you can work out how to have the system >>>> preserve core dumps from Apache so they can be used to extract a true >>>> process stack trace as that may give a clue. >>>> >>>> Graham >>>> >>>> On 24 Oct 2022, at 3:51 am, stuart mcgraw <smcg...@gmail.com> wrote: >>>> >>>> Thanks for that suggestion. I passed it on to the site admin made and >>>> he made the "application-group=%{GLOBAL}" change, but unfortunately it >>>> made >>>> no difference, the segfaults are still occurring as before. Is there >>>> anything else I can look at? The current configuration is: >>>> >>>> WSGIDaemonProcess jmwsgi processes=2 threads=10 \ >>>> display-name=apache2-jmwsgi locale=en_US.UTF-8 lang=en_US.UTF-8 >>>> WSGIScriptAlias /jmwsgi >>>> /usr/local/apache2/jmdictdb/wsgifiles/jmdictdb.wsgi \ >>>> process-group=jmwsgi application-group=%{GLOBAL} >>>> >>>> Would changing to "process=N threads=1" or "processes=1 threads=N" >>>> provide any useful info? Apache, mod_wsgi and the other web server >>>> components were all built there (ie, they are not from distro-supplied >>>> packages.) Are the symptoms consistent with a mismatched library or some >>>> other build configuration issue? Or conversely, maybe they make that >>>> unlikely? >>>> On Friday, October 21, 2022 at 11:48:51 PM UTC-6 Graham Dumpleton wrote: >>>> >>>>> Try changing it to: >>>>> >>>>> WSGIScriptAlias /jmwsgi >>>>> /usr/local/apache2/jmdictdb/wsgifiles/jmdictdb.wsgi \ >>>>> process-group=jmwsgi application-group=%{GLOBAL} >>>>> >>>>> You are possibly using a third party Python module which isn't >>>>> designed to work in Python sub interpreters. That application group value >>>>> forces the main Python interpreter context to be used, which can avoid >>>>> problems with crashes, or thread deadlocks when such broken modules are >>>>> used. >>>>> >>>>> >>>>> https://modwsgi.readthedocs.io/en/master/user-guides/application-issues.html#python-simplified-gil-state-api >>>>> >>>>> That option on WSGIScriptAlias has same affect as WSGIAplicationGroup >>>>> but is more specific. For same reason, your use of WSGIProcessGroup is >>>>> redundant as process group setting on WSGIScriptAlias takes precedence. >>>>> >>>>> Graham >>>>> >>>>> On 22 Oct 2022, at 2:35 pm, stuart mcgraw <smcg...@gmail.com> wrote: >>>>> >>>>> My apologies for the delayed response, I thought I had my google email >>>>> forwarded to my main email account but... :-( >>>>> >>>>> My intent was that the processes run in daemon mode. I had missed the >>>>> info about the WSGIRestrictEmbedded directive when I went through the >>>>> doc, >>>>> I'll ask the admin there to add that. The full configuration for wsgi is: >>>>> >>>>> WSGIDaemonProcess jmwsgi processes=2 threads=10 \ >>>>> display-name=apache2-jmwsgi locale=en_US.UTF-8 lang=en_US.UTF-8 >>>>> WSGIProcessGroup jmwsgi >>>>> WSGIScriptAlias /jmwsgi >>>>> /usr/local/apache2/jmdictdb/wsgifiles/jmdictdb.wsgi \ >>>>> process-group=jmwsgi >>>>> # Serve static files directly without using the app. >>>>> Alias /jmwsgi/web/ /usr/local/apache2/jmdictdb/ >>>>> <Directory /usr/local/apache2/jmdictdb> >>>>> DirectoryIndex disabled >>>>> Require all granted >>>>> </Directory> >>>>> >>>>> The server has a number of virtual hosts and there were a few mod_wsgi >>>>> "Loading Python" messages in the error log for one of them (for ssl) but >>>>> nothing looking errorish and only a few, nowhere near the number of >>>>> segfault messages: >>>>> >>>>> [Sat Oct 01 07:50:12.090697 2022] [wsgi:info] [pid 731154:tid >>>>> 140442461062912] [remote *.*.*.*:40566] mod_wsgi (pid=731154, >>>>> process='jmwsgi', application='www.edrdg.org|/jmwsgi'): Loading >>>>> Python script file '/usr/local/apache2/jmdictdb/wsgifiles/jmdictdb.wsgi'. >>>>> >>>>> But the wsgi configuration stuff is outside all the virtutal hosts. >>>>> >>>>> When the server starts, there are a couple messages in the main error >>>>> log file like: >>>>> >>>>> [Sat Oct 01 06:42:26.499086 2022] [wsgi:info] [pid 731041:tid >>>>> 140442622753728] mod_wsgi (pid=731041): Starting process 'jmwsgi' with >>>>> uid=33, gid=33 and threads=10. >>>>> [Sat Oct 01 06:42:26.499518 2022] [wsgi:info] [pid 731039:tid >>>>> 140442622753728] mod_wsgi (pid=731039): Starting process 'jmwsgi' with >>>>> uid=33, gid=33 and threads=10. >>>>> >>>>> and these are followed/interleaved with the "Initializing Python" and >>>>> "Attach interpreter" messages but after server startup the messages are >>>>> limited to the sets of three I showed: "Initializing Python" and "Attach >>>>> interpreter" followed sometime later by the Segmentation fault. >>>>> >>>>> Does any of that help? >>>>> On Sunday, October 16, 2022 at 4:16:09 PM UTC-6 Graham Dumpleton wrote: >>>>> >>>>>> What other mod_wsgi configuration is there besides >>>>>> the WSGIDaemonProcess directive? That alone only creates a mod_wsgi >>>>>> daemon >>>>>> process group, but does not tell mod_wsgi to use it. Thus cannot tell >>>>>> whether you are using embedded mode or daemon mode. The logs are also >>>>>> odd >>>>>> in that would expect to see other messages in there around when >>>>>> processes >>>>>> are created if using daemon mode, plus an indication of whether a >>>>>> message >>>>>> is being generated from an Apache child process or mod_wsgi daemon >>>>>> process. >>>>>> >>>>>> So can you supply the other parts of the mod_wsgi configuration so >>>>>> can see if properly using daemon mode or not. Also look for logs from >>>>>> mod_wsgi in any per virtual host specific error log file and not just >>>>>> main >>>>>> Apache error log if you separate them. Finally, if you are only >>>>>> intending >>>>>> to use mod_wsgi daemon mode, ensure you add the directive: >>>>>> >>>>>> WSGIRestrictEmbedded On >>>>>> >>>>>> outside of all VirtualHost definitions so that any attempt to >>>>>> intitialise/use Python in main Apache child processes is disabled. >>>>>> >>>>>> Graham >>>>>> >>>>>> On 17 Oct 2022, at 8:07 am, stuart mcgraw <smcg...@gmail.com> wrote: >>>>>> >>>>>> I am author of a Flask application running under Linux/Apache >>>>>> mod_wsgi that is experiencing intermittent, random segmentation faults. >>>>>> >>>>>> What is unusual is that the mod_wsgi process segfaults are occurring >>>>>> not at startup when mod_wsgi is loaded, or at when an incoming request >>>>>> accesses the app, but when the wsgi processes are just sitting there, >>>>>> quiescent. >>>>>> >>>>>> From a user's point of view, everything looks fine, the mod_wsgi >>>>>> processes and the app respond with the right results with no sign of >>>>>> trouble at the client's browser. But looking at the Apache logs shows >>>>>> the >>>>>> wsgi processes periodically segfaulting and getting restarted with no >>>>>> correlated incoming requests. They die sometimes after running for a >>>>>> few >>>>>> minutes, sometimes after a few hours. There are no incoming requests to >>>>>> the the wsgi app logged near the time of these crashes. >>>>>> >>>>>> For example: >>>>>> [Mon May 30 22:35:43.040387 2022] [wsgi:info] [pid 2575903:tid >>>>>> 139929303559104] mod_wsgi (pid=2575903): Initializing Python. >>>>>> [Mon May 30 22:35:43.099053 2022] [wsgi:info] [pid 2575903:tid >>>>>> 139929303559104] mod_wsgi (pid=2575903): Attach interpreter ''. >>>>>> [Tue May 31 01:29:06.434000 2022] [core:notice] [pid 2876203:tid >>>>>> 139929303559104] AH00052: child pid 2511562 exit signal Segmentation >>>>>> fault >>>>>> (11) >>>>>> [Tue May 31 01:29:07.466268 2022] [wsgi:info] [pid 2605661:tid >>>>>> 139929303559104] mod_wsgi (pid=2605661): Initializing Python. >>>>>> [Tue May 31 01:29:07.517413 2022] [wsgi:info] [pid 2605661:tid >>>>>> 139929303559104] mod_wsgi (pid=2605661): Attach interpreter ''. >>>>>> [Tue May 31 04:14:59.405491 2022] [core:notice] [pid 2876203:tid >>>>>> 139929303559104] AH00052: child pid 2575903 exit signal Segmentation >>>>>> fault >>>>>> (11) >>>>>> >>>>>> My wsgi app is still being tested so other than infrequent requests >>>>>> generated by me and a few other people there is very little traffic to >>>>>> it. >>>>>> However the web server itself is handling some continuous moderate >>>>>> volume >>>>>> of traffic to other apps including to C, Python and PHP CGI apps. >>>>>> >>>>>> What I know about the environment (if any other info would be useful >>>>>> I'll try and dig it up): >>>>>> >>>>>> $ cat /etc/*release >>>>>> PRETTY_NAME="Debian GNU/Linux 11 (bullseye) >>>>>> >>>>>> Apache, mod_wsgi, python were all built from source by the site's >>>>>> administrator. >>>>>> >>>>>> There are (at least) two Python's on the system: >>>>>> /usr/bin/python3 -- 3.9.2 >>>>>> /usr/local/bin/python3 -- 3.10.1 >>>>>> >>>>>> Apachche/mod_wsgi is was supposedly built against python-3.10. From >>>>>> the http server header: >>>>>> Apache/2.4.54 (Unix) OpenSSL/1.1.1n mod_wsgi/4.9.4 Python/3.10 >>>>>> PHP/7.4.23 >>>>>> >>>>>> The Apache .conf file uses: >>>>>> WSGIDaemonProcess myapp processes=2 threads=10 \ >>>>>> display-name=apache2-myapp locale=en_US.UTF-8 lang=en_US.UTF-8 >>>>>> >>>>>> $ /usr/local/apache2/bin/httpd -V >>>>>> Server version: Apache/2.4.54 (Unix) >>>>>> Server built: Oct 13 2022 00:07:38 >>>>>> Server's Module Magic Number: 20120211:124 >>>>>> Server loaded: APR 1.6.5, APR-UTIL 1.6.1, PCRE 10.36 2020-12-04 >>>>>> Compiled using: APR 1.6.5, APR-UTIL 1.6.1, PCRE 10.36 2020-12-04 >>>>>> Architecture: 64-bit >>>>>> Server MPM: event >>>>>> threaded: yes (fixed thread count) >>>>>> forked: yes (variable process count) >>>>>> Server compiled with.... >>>>>> -D APR_HAS_SENDFILE >>>>>> -D APR_HAS_MMAP >>>>>> -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) >>>>>> -D APR_USE_SYSVSEM_SERIALIZE >>>>>> -D APR_USE_PTHREAD_SERIALIZE >>>>>> -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT >>>>>> -D APR_HAS_OTHER_CHILD >>>>>> -D AP_HAVE_RELIABLE_PIPED_LOGS >>>>>> -D DYNAMIC_MODULE_LIMIT=256 >>>>>> -D HTTPD_ROOT="/usr/local/apache2" >>>>>> -D SUEXEC_BIN="/usr/local/apache2/bin/suexec" >>>>>> -D DEFAULT_PIDLOG="logs/httpd.pid" >>>>>> -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" >>>>>> -D DEFAULT_ERRORLOG="logs/error_log" >>>>>> -D AP_TYPES_CONFIG_FILE="conf/mime.types" >>>>>> -D SERVER_CONFIG_FILE="conf/httpd.conf" >>>>>> >>>>>> $ bin/httpd -M >>>>>> Loaded Modules: >>>>>> core_module (static) >>>>>> so_module (static) >>>>>> http_module (static) >>>>>> mpm_event_module (static) >>>>>> authz_core_module (shared) >>>>>> authz_host_module (shared) >>>>>> unixd_module (shared) >>>>>> dir_module (shared) >>>>>> access_compat_module (shared) >>>>>> env_module (shared) >>>>>> alias_module (shared) >>>>>> log_config_module (shared) >>>>>> ssl_module (shared) >>>>>> mime_module (shared) >>>>>> socache_shmcb_module (shared) >>>>>> setenvif_module (shared) >>>>>> cgid_module (shared) >>>>>> userdir_module (shared) >>>>>> headers_module (shared) >>>>>> rewrite_module (shared) >>>>>> autoindex_module (shared) >>>>>> negotiation_module (shared) >>>>>> dav_module (shared) >>>>>> deflate_module (shared) >>>>>> info_module (shared) >>>>>> status_module (shared) >>>>>> wsgi_module (shared) >>>>>> evasive24_module (shared) >>>>>> php7_module (shared) >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "modwsgi" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to modwsgi+u...@googlegroups.com. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/modwsgi/e06f789f-6023-417e-8b10-1f570adc069cn%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/modwsgi/e06f789f-6023-417e-8b10-1f570adc069cn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "modwsgi" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to modwsgi+u...@googlegroups.com. >>>>> >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/modwsgi/093152d7-26a6-4d52-8c7b-0d4cb643fa95n%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/modwsgi/093152d7-26a6-4d52-8c7b-0d4cb643fa95n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "modwsgi" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to modwsgi+u...@googlegroups.com. >>>> >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/modwsgi/363f423b-5be1-4c33-8783-638c0cd72512n%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/modwsgi/363f423b-5be1-4c33-8783-638c0cd72512n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "modwsgi" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to modwsgi+u...@googlegroups.com. >>> >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/modwsgi/e51f7b06-b0e5-42b3-ac9c-3cc3cb89070en%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/modwsgi/e51f7b06-b0e5-42b3-ac9c-3cc3cb89070en%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "modwsgi" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to modwsgi+u...@googlegroups.com. >> >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/modwsgi/cecd0f01-2344-466f-9ed1-4fae73dc2762n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/modwsgi/cecd0f01-2344-466f-9ed1-4fae73dc2762n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> >> -- > You received this message because you are subscribed to the Google Groups > "modwsgi" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to modwsgi+u...@googlegroups.com. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/modwsgi/6ff197f1-c73a-42c4-b849-9418370fc9e3n%40googlegroups.com > > <https://groups.google.com/d/msgid/modwsgi/6ff197f1-c73a-42c4-b849-9418370fc9e3n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/65ac6eaa-37bf-4aaf-9d96-383ce02adbd0n%40googlegroups.com.