> Sorry for the slow reply, been on long holiday. >
Hope you had a good holiday ;) And sorry for an even slower reply ;) > *Why are you using maximum-requests and inactivity-timeout? * > I would always recommended against using these options if there is no > real requirement as unnecessary restarts should always be avoided > because the startup costs of loading up fat Python web applications > can significantly impact server performance if your server is under > constant load. > > IOW, the last thing you want to happen if you are under loading, > especially a load spike, is to restart the application due to hitting > maximum-requests. > > We are running 5 low usage Django sites on a single host. Our system admin is worried about one the sites getting out of control and affecting the others, i.e. memory leak. This is why we're using maximum requests. I'm not sure why about inactivity-timeout - I'll ask. These sites are not public web-facing sites, they have small user bases. We're more interested in reliability and low maintenance than performance under heavy load. If the daemon processes are not periodically restarted using maximum-requests, then should they be periodically restarted? The thought of having a python VM running for years, makes me a bit nervous. I guess a cron job and an Apache reload perhaps? > Anyway, I don't recollect specifically a case where daemon process > wouldn't come back after being killed off. Are you absolutely sure, > from running 'ps' and checking process IDs that they didn't exist in > any state in the process table? *Are the process perhaps still there > and haven't actually shutdown properly?* > The processes are definitely gone. We have been keeping an eye on the processes in ps. If Django or python crashed, I would expect to see some error messages in our logs. I had a brief look at the source code for mod_wsgi to understand what is happening. Based on the info log messages it appears that the restart fails whenever the "shutdown reaper" runs. As I understand it, the reaper runs if Django takes too long to shutdown. *Is mod_wsgi supposed to restart a process after the "reaper" kills it?* We've also experimented with increasing the shutdown-timeout, so that the reaper is less likely to kill the process. This improved things but did not make the problem go away entirely. > The message: > > [Fri Sep 21 09:10:23 2012] [info] mod_wsgi (pid=3845): Aborting process > 'XXusername-omittedXX'. > > indicates there certainly wasn't a clean shutdown, possibly because of > a threading dead lock or hung request thread. However, when you get > that message there is a C level call to exit() immediately after so > the process should die. > > Technically calling exit() doesn't absolutely guarantee shutdown as a > C level atexit() handler could block but neither Apache or mod_wsgi > should be setting up any of those. > > The problem with maximum-requests and inactivity-timeout is that they > are a self initiated restart and a graceful one at that. There is > therefore no absolute failsafe if exit() doesn't work as there would > if it was an Apache restart. In the case of an Apache restart, the > Apache parent process will send a SIGKILL to the child processes if > they haven't exited after 3 seconds. For a self initiated restart, the > parent process doesn't know it is restarting so if something prevents > the process from properly dying, then it will not be notified and > replace the process. > > In summary, validate for sure whether the daemon process exist still > or not in the process table and in what state. > > If they do still exist, try and use gdb as described in: > > > http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_Crashes_With_GDB > > > to work out where they are blocked. > > That all said, don't recollect seeing this specific issue with > maximum-requests, have seen similar outcome of processes not handling > requests in the past. Generally this was always because of application > level code blocking exhausting threads. There were some unexplained > cases, but this seemed to vanish with no changes to mod_wsgi. There > was suspicion though at the time that for some reason the distro > Apache was causing problems. From memory this was for Apache in patch > version range around 15-17 of 2.2 and you are using 16. *Is there a > reason why you can't upgrade to a newer Apache version than what you > are using?* I assume this is just the stock Debian Apache version. I'm not the system admin, but my guess is we'd then need to manually apply security patches, rather than using the package management system. We're still experiencing the problem. We are monitoring it and restarting processes when needed. I've started looking into this again, and will keep you, and the list up to date with how it goes. It sounds like the best approach is to remove the maximum requests and inactivity timeout settings, and to set up a cron job to periodically restart the daemon processes, perhaps every week or so. Thanks for your input, and again, sorry for ignoring your reply for so long. Cheers, Greg. -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To view this discussion on the web visit https://groups.google.com/d/msg/modwsgi/-/6J_gSvtqkNgJ. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en.
