Yes, sir, my tests also seem to show it works as you intend it to. Thanks.
On Tue, Feb 3, 2015 at 5:20 AM, Graham Dumpleton <[email protected] > wrote: > Should now be fixed. > > On 03/02/2015, at 8:50 PM, Graham Dumpleton <[email protected]> > wrote: > > The application of the eviction timeout should not be fixed in develop > branch. > > https://github.com/GrahamDumpleton/mod_wsgi/archive/develop.tar.gz > > Graham > > On 03/02/2015, at 5:02 PM, Graham Dumpleton <[email protected]> > wrote: > > > > On 3 February 2015 at 04:15, Kent Bower <[email protected]> wrote: > >> On Sun, Feb 1, 2015 at 7:08 PM, Graham Dumpleton < >> [email protected]> wrote: >> >>> Your Flask client doesn't need to know about Celery, as your web >>> application accepts requests as normal and it is your Python code which >>> would queue the job with Celery. >>> >>> Now looking back, the only configuration I can find, but which I don't >>> know if it is your actual production configuration is: >>> >>> WSGIDaemonProcess rarch processes=3 threads=2 >>> inactivity-timeout=1800 display-name=%{GROUP} graceful-timeout=140 >>> eviction-timeout=60 python-eggs=/home/rarch/tg2env/lib/python-egg-cache >>> >>> Provided that you don't then start to have overall host memory issues, >>> the simplest way around this whole issue is not to use a multithreaded >>> process. >>> >>> >> What you would do is vertically partition your URL name space so that >>> just the URLs which do the long running report generation would be >>> delegated to single threaded processes. Everything else would keep going to >>> the multithread processes. >>> >>> WSGIDaemonProcess rarch processes=3 threads=2 >>> WSGIDaemonProvess rarch-long-running processes=6 threads=1 >>> maximum-requests=20 >>> >>> WSGIProcessGroup rarch >>> >>> <Location /suburl/of/long/running/report/generator> >>> WSGIProcessGroup rarch-long-running >>> </Location> >>> >>> You wouldn't even have to worry about the graceful-timeout >>> on rarch-long-running as that is only relevant for maxiumum-requests where >>> it is a multithreaded processes. >>> >>> So what would happen is that when the request has finished, if >>> maximum-requests is reached, the process would be restarted even before any >>> new request was accepted by the process, so there is no chance of a new >>> request being interrupted. >>> >>> You could still set an eviction-timeout of some suitably large value to >>> allow you to use SIGUSR1 to be sent to processes in that daemon process >>> group to shut them down. >>> >>> In this case, having eviction-timeout being able to be set independent >>> of graceful-timeout (for maximum-requests), is probably useful and so I >>> will retain the option. >>> >>> So is there any reason you couldn't use a daemon process group with many >>> single threaded process instead? >>> >> >> >> This is very good to know (that single threaded procs would behave more >> ideally in these circumstances). The above was just my configuration for >> testing 'eviction-timeout'. Our software generally runs with many more >> processes and threads, on servers with maybe 16 or 32 GB RAM. And >> unfortunately, the RAM is the limiting resource here as our python app, >> built on turbo-gears, is a memory hog and we have yet to find the resources >> to dissect that. I was aiming to head in the direction of URL >> partitioning, but there are big obstacles. (Chiefly, RAM consumption would >> make threads=1 and yet more processes very difficult unless we spend the >> huge effort in dissecting the app to locate and pull the many unused memory >> hogging libraries out.) >> >> So, URL partitioning is sort of the ideal, distant solution, as well as a >> Celery-like polling solution, but out of my reach for now. >> > > Have you ever run a test where you compare the whole memory usage of your > application where all URLs are visited, to how much memory is used if only > the URL which generates the long running report is visited? > > In Django at least, a lot of stuff is lazily loaded only when a URL > requiring it is first accessed. So even with a heavy code base, there can > still be benefits in splitting out URLs to their own processes because the > whole code base wouldn't be loaded due to the lazy loading. > > So do you have any actual memory figures from doing that? > > How many URLs are there that generates these reports vs those that don't, > or is that all the whole application does? > > Are your most frequently visited URLs those generating the reports or > something else? > > >> Another question for multithreaded graceful-timeout with >> maximum-requests: during a period of heavy traffic, it seems >> the graceful-timeout setting just pushes the real timeout until >> shutdown-timeout because, if heavy enough, you'll be getting requests >> during graceful-timeout. That diminishes the fidelity of >> "graceful-timeout." Do you see where I'm coming from (even if you're happy >> with the design and don't want to mess with it, which I'd understand)? >> >> >> Ok, here is the log demonstrating the troubles I saw >> with eviction-timeout. For demonstration purposes, here is the simplified >> directive I'm using: >> >> WSGIDaemonProcess rarch processes=1 threads=1 display-name=%{GROUP} >> *graceful-timeout=140 >> eviction-timeout=60 *python-eggs=/home/rarch/tg2env/lib/python-egg-cache >> >> Here is the log: >> >> [Mon Feb 02 11:36:16 2015] [info] Init: Initializing (virtual) servers >> for SSL >> [Mon Feb 02 11:36:16 2015] [info] Server: Apache/2.2.3, Interface: >> mod_ssl/2.2.3, Library: OpenSSL/0.9.8e-fips-rhel5 >> [Mon Feb 02 11:36:16 2015] [notice] Digest: generating secret for digest >> authentication ... >> [Mon Feb 02 11:36:16 2015] [notice] Digest: done >> [Mon Feb 02 11:36:16 2015] [info] APR LDAP: Built with OpenLDAP LDAP SDK >> [Mon Feb 02 11:36:16 2015] [info] LDAP: SSL support available >> [Mon Feb 02 11:36:16 2015] [info] Init: Seeding PRNG with 256 bytes of >> entropy >> [Mon Feb 02 11:36:16 2015] [info] Init: Generating temporary RSA private >> keys (512/1024 bits) >> [Mon Feb 02 11:36:16 2015] [info] Init: Generating temporary DH >> parameters (512/1024 bits) >> [Mon Feb 02 11:36:16 2015] [info] Shared memory session cache initialised >> [Mon Feb 02 11:36:16 2015] [info] Init: Initializing (virtual) servers >> for SSL >> [Mon Feb 02 11:36:16 2015] [info] Server: Apache/2.2.3, Interface: >> mod_ssl/2.2.3, Library: OpenSSL/0.9.8e-fips-rhel5 >> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447): Starting process >> 'rarch' with uid=48, gid=48 and threads=1. >> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447): Python home >> /home/rarch/tg2env. >> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447): Initializing >> Python. >> [Mon Feb 02 11:36:16 2015] [notice] Apache/2.2.3 (CentOS) configured -- >> resuming normal operations >> [Mon Feb 02 11:36:16 2015] [info] Server built: Aug 30 2010 12:28:40 >> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447): Attach >> interpreter ''. >> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447, process='rarch', >> application=''): Loading WSGI script >> '/home/rarch/trunk/src/appserver/wsgi-config/wsgi-deployment.py'. >> *[Mon Feb 02 11:39:13 2015] [info] mod_wsgi (pid=29447): Process eviction >> requested, waiting for requests to complete 'rarch'.* >> *[Mon Feb 02 11:41:00 2015] [info] mod_wsgi (pid=29447): Daemon process >> graceful timer expired 'rarch'.* >> [Mon Feb 02 11:41:00 2015] [info] mod_wsgi (pid=29447): Shutdown >> requested 'rarch'. >> [Mon Feb 02 11:41:05 2015] [info] mod_wsgi (pid=29447): Aborting process >> 'rarch'. >> [Mon Feb 02 11:41:05 2015] [info] mod_wsgi (pid=29447): Exiting process >> 'rarch'. >> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=29447): Process 'rarch' >> has died, deregister and restart it. >> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=29447): Process 'rarch' >> has been deregistered and will no longer be monitored. >> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331): Starting process >> 'rarch' with uid=48, gid=48 and threads=1. >> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331): Python home >> /home/rarch/tg2env. >> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331): Initializing >> Python. >> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331): Attach >> interpreter ''. >> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331, process='rarch', >> application=''): Loading WSGI script >> '/home/rarch/trunk/src/appserver/wsgi-config/wsgi-deployment.py'. >> >> The process was signaled at 11:39:13 with eviction-timeout=60 but >> 11:40:13 came and passed and nothing happened until 107 seconds passed, at >> which time graceful timer expired. >> >> >> Next, I changed the parameters a little: >> >> WSGIDaemonProcess rarch processes=1 threads=1 display-name=%{GROUP} >> *eviction-timeout=30 >> graceful-timeout=240* python-eggs=/home/rarch/tg2env/lib/python-egg-cache >> >> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381): Starting process >> 'rarch' with uid=48, gid=48 and threads=1. >> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381): Python home >> /home/rarch/tg2env. >> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381): Initializing >> Python. >> [Mon Feb 02 12:06:57 2015] [notice] Apache/2.2.3 (CentOS) configured -- >> resuming normal operations >> [Mon Feb 02 12:06:57 2015] [info] Server built: Aug 30 2010 12:28:40 >> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381): Attach interpreter >> ''. >> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381, process='rarch', >> application=''): Loading WSGI script >> '/home/rarch/trunk/src/appserver/wsgi-config/wsgi-deployment.py'. >> [Mon Feb 02 12:07:19 2015] [info] mod_wsgi (pid=3381): *Process eviction >> requested*, waiting for requests to complete 'rarch'. >> [Mon Feb 02 12:11:01 2015] [info] mod_wsgi (pid=3381): *Daemon process >> graceful timer expired* 'rarch'. >> [Mon Feb 02 12:11:01 2015] [info] mod_wsgi (pid=3381): Shutdown requested >> 'rarch'. >> [Mon Feb 02 12:11:06 2015] [info] mod_wsgi (pid=3381): Aborting process >> 'rarch'. >> [Mon Feb 02 12:11:06 2015] [info] mod_wsgi (pid=3381): Exiting process >> 'rarch'. >> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=3381): Process 'rarch' >> has died, deregister and restart it. >> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=3381): Process 'rarch' >> has been deregistered and will no longer be monitored. >> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028): Starting process >> 'rarch' with uid=48, gid=48 and threads=1. >> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028): Python home >> /home/rarch/tg2env. >> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028): Initializing >> Python. >> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028): Attach interpreter >> ''. >> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028, process='rarch', >> application=''): Loading WSGI script >> '/home/rarch/trunk/src/appserver/wsgi-config/wsgi-deployment.py'. >> >> >> So, for me, eviction-timeout is apparently being ignored... >> > > The background monitor thread which monitors for expiry wasn't taking into > consideration the eviction timeout period being able to be less than the > graceful timeout. I didn't see a problem as I was also setting request > timeout, which causes the way the monitor thread works to be different, > waking up every second regardless. I will work on a fix for that. > > Another issue for consideration is if a graceful timeout is already in > progress and a signal comes in for eviction, which timeout wins? Right now > the eviction time will trump the graceful time if already set by maximum > requests. The converse isn't true though in that if already in eviction > cycle and maximum requests arrives, it wouldn't be trumped by graceful > timeout. So eviction time had authority given that it was triggered by > explicit user signal. It does mean that the signal could effectively extend > what ever graceful time was in progress. > > Graham > > >> Thanks again for all your time and help, >> Kent >> >> >> >>> Note that since only a sub set of URLs would go to the daemon process >>> group, the memory usage profile will change as you aren't potentially >>> loading the complete application code into those processes and only those >>> needed for that URL and that report. So it could use up less memory than >>> application as a whole, allowing you to have multiple single threaded >>> processes with no issue. >>> >>> Graham >>> >>> On 31/01/2015, at 12:31 AM, Kent <[email protected]> wrote: >>> >>> Thanks for your reply and recommendations. We're aware of the issues, >>> but I didn't give the full picture for brevity's sake. The reports are >>> user generated reports. Ultimately, the users know whether the reports >>> should return quickly (which many, many will), or whether they are >>> long-running. There is no way for the application to know that, so to >>> avoid some sort of polling (which we've done in the past and was a pain in >>> the rear to users), the design is to allow the *user *to decide whether >>> to run the report in the background or "foreground" via a check box. Since >>> most reports will return in a matter of a minute or so, we wanted to avoid >>> the pain of making them poll, but I need to look at Celery. However, I'm >>> not comfortable punishing users for accidentally choosing foreground on a >>> long-running report. That is, not for an automatic turn-over mechanism >>> like maximum-requests or inactivity-timeout. In my mind, those are >>> inherently different than something like a SIGUSR1 mechanism because the >>> former are automatic. >>> >>> So, while admitting there are edge cases we are using that don't have a >>> perfect solution (or even admitting we need a better mechanism in that >>> case), it still seems to me mod_wsgi should be somewhat agnostic of design >>> choices. In other words, when it comes to *automatic *turning over of >>> processes, it seems mod_wsgi shouldn't be involved with length of time >>> considerations, except to allow the user to specify timeouts. See, the >>> long running reports are only one of my concerns: we also fight with >>> database locks sometimes, held by another application attached to the same >>> database and wholly out of our control. Sometimes those locks can be held >>> for many minutes on a request that normally should complete within >>> seconds. There too, it seems mod_wsgi should be very gentle in the >>> automatic turnover cases. >>> >>> Thanks for pointing to Celery. I really wonder whether I can get >>> a message broker to work with Adobe Flash, our current client, but I >>> haven't looked into this much yet. >>> >>> Also, my apologies if you believe this to have been a waste of time on >>> your part. You've been extremely helpful, though and I'm quite thankful >>> for your time! I understand you not wanting to redesign the >>> shutdown-timeout thing and mess with what otherwise isn't broken. Would >>> you still like me to post the apache debug logs regarding >>> 'eviction-timeout' or have you changed your mind about releasing that? (In >>> which case, extra apologies.) >>> >>> Kent >>> >>> >>> >>> >>> On Friday, January 30, 2015 at 6:34:28 AM UTC-5, Graham Dumpleton wrote: >>>> >>>> If you have web requests generating reports which take 40 minutes to >>>> run, you are going the wrong way about it. >>>> >>>> What would be regarded as best practice for long running requests is to >>>> use a task queuing system to queue up the task to be run and run it in a >>>> distinct set of processes to the web server. Your web request can then >>>> return immediately, with some sort of polling system used as necessary to >>>> check the progress of the task and allow the result to be downloaded when >>>> complete. By using a separate system to run the tasks, it doesn't matter >>>> whether the web server is restarted as the tasks will still run and after >>>> the web server is restarted, a user can still check on progress of the >>>> tasks and get back his response. >>>> >>>> The most common such task execution system for doing this sort of thing >>>> is Celery. >>>> >>>> So it is because you aren't using the correct tool for the job here >>>> that you are fighting against things like timeouts in the web server. No >>>> web server is really a suitable environment to be used as an in process >>>> task execution system. The web server should handle requests quickly and >>>> offload longer processing tasks a separate task system which is purpose >>>> built for handling the management of long running tasks. >>>> >>>> I am not inclined to keep fiddling how the timeouts work now I >>>> understand what you are trying to do. I am even questioning now whether I >>>> should have introduced the separate eviction timeout I already did given >>>> that it is turning out to be a questionable use case. >>>> >>>> I would really recommend you look at re-architecting how you do things. >>>> I don't think I would have any trouble finding others on the list who would >>>> advise the same thing and who could also give you further advice on using >>>> something like Celery instead for task execution. >>>> >>>> Graham >>>> >>>> On 29/01/2015, at 7:30 AM, Kent <[email protected]> wrote: >>>> >>>> Ok, I plan to run those tests with debug and post, but please, in the >>>> meantime: >>>> >>>> For our app, not interrupting existing requests is a higher priority >>>> than being able to accept new requests, particularly since we typically run >>>> many wsgi processes, each with a handful of threads. So, I'm not really >>>> concerned about maintaining always available threads (statistically, I will >>>> be fine... that isn't the issue for me). >>>> >>>> In these circumstances, it would be much better for all these >>>> triggering events (SIGUSR1, maximum-requests, or inactivity-timeout, etc.) >>>> to immediately stop accepting new requests and "concentrate" on shutting >>>> down. (Unless that means requests waiting in apache are terminated because >>>> they were queued for this particular process, but I doubt apache has >>>> already determined the request's process if *none *are available, has >>>> it?) With high graceful-timeout/eviction-timeout and low >>>> shutdown-timeout, I run a pretty high risk of accepting a new request at >>>> the tail end of graceful-timeout or eviction-timeout, only to have it >>>> basically doomed to ungraceful death because many of our requests are long >>>> running (very often well over 5 or 10 sec). >>>> >>>> I guess that's why, through experimentation with SIGUSR1 a few years >>>> back, I ended up "graceful-timeout=5 shutdown-timeout=300" ... the opposite >>>> of how it would default, because this works well when trying to signal >>>> these to recycle themselves: they basically immediately stop accepting new >>>> requests so your "guaranteed" graceful timeout is 300. It seems I have no >>>> way to "guarantee" a very large graceful timeout for each and every >>>> request, even if affected by maximum-requests or inactivity-timeout, and >>>> specify a different (lower) one for SIGUSR1 because the only truly >>>> guaranteed lifetime in seconds is "shutdown-timeout," is that accurate? >>>> >>>> The ideal for our app, which may accept certain request that run for >>>> several minutes is this: >>>> >>>> - if maximum-requests or inactivity-timeout is hit, stop taking new >>>> requests immediately and shutdown as soon as possible, but give existing >>>> requests basically all the time they need to finish (say, up to 40 >>>> minutes >>>> (for long-running db reports)). >>>> - if SIGUSR1, stop taking new requests immediately and shutdown as >>>> soon as possible, but give existing requests a really good chance to >>>> complete, maybe 3-5 minutes, but not the 40 minutes, because this is >>>> slightly more urgent (was triggered manually and a user is >>>> monitoring/waiting for turnover and wants new code in place) >>>> >>>> I don't think I can accomplish the above if I understand the design >>>> correctly because a request may have been accepted at the tail end of >>>> graceful-timeout/eviction-timeout and so is only guaranteed a lifetime >>>> of shutdown-timeout, regardless of what the trigger was (SIGUSR1 vs. >>>> automatic). >>>> >>>> Is my understanding of this accurate? >>>> >>>> >>>> >>>> On Tuesday, January 27, 2015 at 9:48:01 PM UTC-5, Graham Dumpleton >>>> wrote: >>>> >>>> Can you ensure that LogLevel is set to at least info and provide what >>>> messages are in the Apache error log file >>>> >>>> If I use: >>>> >>>> $ mod_wsgi-express start-server hack/sleep.wsg--log-level=debug >>>> --verbose-debugging --eviction-timeout 30 --graceful-timeout 60 >>>> >>>> which is equivalent to: >>>> >>>> WSGIDaemonProcess … graceful-timeout=60 eviction-timeout=30 >>>> >>>> and fire a request against application that sleeps a long time I see in >>>> the Apache error logs at the time of the signal: >>>> >>>> [Wed Jan 28 13:34:34 2015] [info] mod_wsgi (pid=29639): Process >>>> eviction requested, waiting for requests to complete 'localhost:8000'. >>>> >>>> At the end of the 30 seconds given by the eviction timeout I see: >>>> >>>> [Wed Jan 28 13:35:05 2015] [info] mod_wsgi (pid=29639): Daemon process >>>> graceful timer expired 'localhost:8000'. >>>> [Wed Jan 28 13:35:05 2015] [info] mod_wsgi (pid=29639): Shutdown >>>> requested 'localhost:8000'. >>>> >>>> Up till that point the process would still have been accepting new >>>> requests and was waiting for point that there was no active requests to >>>> allow it to shutdown. >>>> >>>> As the timeout tripped at 30 seconds, it then instead goes into the >>>> more brutal shutdown process. No new requests are accepted from this point. >>>> >>>> For my setup the shutdown-timeout defaults to 5 seconds and because the >>>> request still hadn't completed within 5 seconds, then the process is exited >>>> anyway and allowed to shutdown. >>>> >>>> [Wed Jan 28 13:35:10 2015] [info] mod_wsgi (pid=29639): Aborting >>>> process 'localhost:8000'. >>>> [Wed Jan 28 13:35:10 2015] [info] mod_wsgi (pid=29639): Exiting process >>>> 'localhost:8000'. >>>> >>>> Because the application never returned a response, that results in the >>>> Apache child worker who was trying to talk to the daemon process seeing a >>>> truncated response. >>>> >>>> [Wed Jan 28 13:35:10 2015] [error] [client 127.0.0.1] Truncated or >>>> oversized response headers received from daemon process 'localhost:8000': >>>> /tmp/mod_wsgi-localhost:8000:502/htdocs/ >>>> >>>> When the Apache parent process notices the daemon process has died, it >>>> cleans up and starts a new one. >>>> >>>> [Wed Jan 28 13:35:11 2015] [info] mod_wsgi (pid=29639): Process >>>> 'localhost:8000' has died, deregister and restart it. >>>> [Wed Jan 28 13:35:11 2015] [info] mod_wsgi (pid=29639): Process >>>> 'localhost:8000' has been deregistered and will no longer be monitored. >>>> [Wed Jan 28 13:35:11 2015] [info] mod_wsgi (pid=29764): Starting >>>> process 'localhost:8000' with threads=5. >>>> >>>> So the shutdown phase specified by shutdown-timeout is subsequent to >>>> eviction-timeout. It is one last chance to shutdown during a time that no >>>> new requests are accepted in case it is the constant flow of requests that >>>> is preventing it, rather than one long running request. >>>> >>>> The shutdown-timeout should always be kept quite short because no new >>>> requests will be accepted during that time. So changing it from the default >>>> isn't something one would normally do. >>>> >>>> Graham >>>> >>>> On 28/01/2015, at 3:02 AM, Kent <[email protected]> wrote: >>>> >>>> Let me be more specific. I'm having a hard time getting this to test >>>> as I expected. Here is my WSGIDaemonProcess directive: >>>> >>>> WSGIDaemonProcess rarch processes=3 threads=2 inactivity-timeout=1800 >>>> display-name=%{GROUP} *graceful-timeout=140 eviction-timeout=60* >>>> python-eggs=/home/rarch/tg2env/lib/python-egg-cache >>>> >>>> I put a 120 sec sleep in one of the processes' requests and then >>>> SIGUSR1 (Linux) all three processes. The two inactive ones immediately >>>> restart, as I expect. However, the 3rd (sleeping) one is allowed to run >>>> past the 60 second eviction_timeout and runs straight to >>>> the graceful_timeout before it is terminated. Shouldn't it have been >>>> killed at 60 sec? >>>> >>>> (And then, as my previous question, how does shutdown-timeout factor >>>> into all this?) >>>> >>>> Thanks again! >>>> Kent >>>> >>>> >>>> >>>> On Tuesday, January 27, 2015 at 9:34:12 AM UTC-5, Kent wrote: >>>> >>>> I think I might understand the difference between 'graceful-timeout' >>>> and 'shutdown-timeout', but can you please just clarify the difference? >>>> Are they additive? >>>> >>>> Also, will 'eviction-timeout' interact with either of those, or simply >>>> override them? >>>> >>>> Thanks, >>>> Kent >>>> >>>> On Monday, January 26, 2015 at 12:44:13 AM UTC-5, Graham Dumpleton >>>> wrote: >>>> >>>> Want to give: >>>> >>>> https://github.com/GrahamDumpleton/mod_wsgi/archive/develop.tar.gz >>>> >>>> a go? >>>> >>>> The WSGIDaemonProcess directive is 'eviction-timeout'. For >>>> mod_wsgi-express the command line option is '--eviction-timeout'. >>>> >>>> So the terminology am using around this is that sending a signal is >>>> like forcibly evicting the WSGI application, allow the process to be >>>> restarted. At least this way can have an option name that is distinct >>>> enough from generic 'restart' so as not to be confusing. >>>> >>>> Graham >>>> >>>> On 21/01/2015, at 11:15 PM, Kent <[email protected]> wrote: >>>> >>>> >>>> On Tuesday, January 20, 2015 at 5:53:26 PM UTC-5, Graham Dumpleton >>>> wrote: >>>> >>>> >>>> On 20/01/2015, at 11:50 PM, Kent <[email protected]> wrote: >>>> >>>> On Sunday, January 18, 2015 at 12:43:08 AM UTC-5, Graham Dumpleton >>>> wrote: >>>> >>>> There are a few possibilities here of how this could be >>>> enhanced/changed. >>>> >>>> The problem with maximum-requests is that it can be dangerous. People >>>> can set it too low and when their site gets a big spike of traffic then the >>>> processes can be restarted too quickly only adding to the load of the site >>>> and causing things to slow down and hamper their ability to handle the >>>> spike. This is where setting a longer amount of time for graceful-timeout >>>> helps because you can set it to be quite large. The use of maximum-requests >>>> can still be like using a hammer though, and one which can be applied >>>> unpredictably. >>>> >>>> >>>> Yes, I can see that. (It may be overkill, but you could default a >>>> separate minimum-lifetime parameter so only users who specifically mess >>>> with that as well as maximum-requests shoot themselves in the foot, but it >>>> is starting to get confusing with all the different timeouts, I'll agree >>>> there...) >>>> >>>> >>>> >>>> The minimum-lifetime option is an interesting idea. It may have to do >>>> nothing by default to avoid conflicts with existing expected behaviour. >>>> >>>> >>>> The maximum-requests option also doesn't help in the case where you are >>>> running background threads which do stuff and it is them and not the number >>>> of requests coming in that dictate things like memory growth that you want >>>> to counter. >>>> >>>> >>>> True, but solving with maximum lifetime... well, actually, solving >>>> memory problems with *any *of these mechanisms isn't measuring the >>>> heart of the problem, which is RAM. I imagine there isn't a good way to >>>> measure RAM or you would have added that option by now. Seems what we are >>>> truly after for the majority of these isn't how many requests or how log >>>> its been up, etc, but how much RAM it is taking (or perhaps, optionally, >>>> average RAM per thread, instead). If my process exceeds consuming 1.5GB, >>>> then trigger a graceful restart at the next appropriate convenience, being >>>> gentle to existing requests. That may be arguably the most useful >>>> parameter. >>>> >>>> >>>> The problem with calculating memory is that there isn't one cross >>>> platform portable way of doing it. On Linux you have to dive into the /proc >>>> file system. On MacOS X you can use C API calls. On Solaris I think you >>>> again need to dive into a /proc file system but it obviously has a >>>> different file structure for getting details out compared to Linux. Adding >>>> such cross platform stuff in gets a bit messy. >>>> >>>> What I was moving towards as an extension of the monitoring stuff I am >>>> doing for mod_wsgi was to have a special daemon process you can setup which >>>> has access to some sort of management API. You could then create your own >>>> Python script that runs in that and which using the management API can get >>>> daemon process pids and then use Python psutil to get memory usage on >>>> periodic basis and then you decide if process should be restarted and send >>>> it a signal to stop, or management API provided which allows you to notify >>>> in some way, maybe by signal, or maybe using shared memory flag, that >>>> daemon process should shut down. >>>> >>>> >>>> I figured there was something making that a pain... >>>> >>>> >>>> So the other option I have contemplated adding a number of times is is >>>> one to periodically restart the process. The way this would work is that a >>>> process restart would be done periodically based on what time was >>>> specified. You could therefore say the restart interval was 3600 and it >>>> would restart the process once an hour. >>>> >>>> The start of the time period for this would either be, when the process >>>> was created, if any Python code or a WSGI script was preloaded at process >>>> start time. Or, it would be from when the first request arrived if the WSGi >>>> application was lazily loaded. This restart-interval could be tied to the >>>> graceful-timeout option so that you can set and extended period if you want >>>> to try and ensure that requests are not interrupted. >>>> >>>> >>>> We just wouldn't want it to die having never even served a single >>>> request, so my vote would be *against *the birth of the process as the >>>> beginning point (and, rather, at first request). >>>> >>>> >>>> It would effectively be from first request if lazily loaded. If >>>> preloaded though, as background threads could be created which do stuff and >>>> consume memory over time, would then be from when process started, ie., >>>> when Python code was preloaded. >>>> >>>> >>>> But then for preloaded, processes life-cycle themselves for no reason >>>> throughout inactive periods like maybe overnight. That's not the end of >>>> the world, but I wonder if we're catering to the wrong design. (These are, >>>> after all, webserver processes, so it seems a fair assumption that they >>>> exist primarily to handle requests, else why even run under apache?) My >>>> vote, for what it's worth, would still be timed from first request, but I >>>> probably won't use that particular option. Either way would be useful for >>>> some I'm sure. >>>> >>>> >>>> >>>> Now we have the ability to sent the process graceful restart signal >>>> (usually SIGUSR1), to force an individual process to restart. >>>> >>>> Right now this is tied to the graceful-timeout duration as well, which >>>> as you point out, would perhaps be better off having its own time duration >>>> for the notional grace period. >>>> >>>> Using the name restart-timeout for this could be confusing if I have a >>>> restart interval option. >>>> >>>> >>>> In my opinion, SIGUSR1 is different from the automatic parameters >>>> because it was (most likely) triggered by user intervention, so that one >>>> should ideally have its own parameter. If that is the case and this >>>> parameter becomes dedicated to SIGUSR1, then the least ambiguous name I can >>>> think of is *sigusr1-timeout*. >>>> >>>> >>>> >>>> Except that it isn't guaranteed to be called SIGUSR1. Technically it >>>> could be a different signal dependent on platform that Apache runs as. But >>>> then, as far as I know all UNIX systems do use SIGUSR1. >>>> >>>> >>>> In any case, they are "signals": you like *signal-timeout?* (Also >>>> could be taken ambiguously, but maybe less so than restart-timeout?) >>>> >>>> >>>> I also have another type of process restart I am trying to work out how >>>> to accommodate and the naming of options again complicates the problem. In >>>> this case we want to introduce an artificial restart delay. >>>> >>>> This would be an option to combat the problem which is being caused by >>>> Django 1.7 in that WSGI script file loading for Django isn't stateless. If >>>> a transient problem occurs, such as the database not being ready, the >>>> loading of the WSGI script file can fail. On the next request an attempt is >>>> made to load it again but now Django kicks a stink because it was half way >>>> setting things up last time when it failed and the setup code cannot be run >>>> a second time. The result is that the process then keeps failing. >>>> >>>> The idea of the restart delay option therefore is to allow you to set >>>> it to number of seconds, normally just 1. If set like that, if a WSGI >>>> script file import fails, it will effectively block for the delay specified >>>> and when over it will kill the process so the whole process is thrown away >>>> and the WSGI script file can be reloaded in a fresh process. This gets rid >>>> of the problem of Django initialisation not being able to be retried. >>>> >>>> >>>> (We are using turbogears... I don't think I've seen something like that >>>> happen, but rarely have seen start up anomalies.) >>>> >>>> >>>> A delay is needed to avoid an effective fork bomb, where a WSGI script >>>> file not loading with high request throughput would cause a constant cycle >>>> of processes dying and being replaced. It is possible it wouldn't be as bad >>>> as I think as Apache only checks for dead processes to replace once a >>>> second, but still prefer my own failsafe in case that changes. >>>> >>>> I am therefore totally fine with a separate graceful time period for >>>> when SIGUSR1 is used, I just need to juggle these different features and >>>> come up with an option naming scheme that make sense. >>>> >>>> How about then that I add the following new options: >>>> >>>> maximum-lifetime - Similar to maximum-requests in that it will >>>> cause the processes to be shutdown and restarted, but in this case it will >>>> occur based on the time period given as argument, measured from the first >>>> request or when the WSGI script file or any other Python code was >>>> preloaded, that is, in the latter case when the process was started. >>>> >>>> restart-timeout - Specifies a separate grace period for when the >>>> process is being forcibly restarted using the graceful restart signal. If >>>> restart-timeout is not specified and graceful-timeout is specified, then >>>> the value of graceful-timeout is used. If neither are specified, then the >>>> restart signal will be have similar to the process being sent a SIGINT. >>>> >>>> linger-timeout - When a WSGI script file, of other Python code is >>>> being imported by mod_wsgi directly, if that fails the default is that the >>>> error is ignored. For a WSGI script file reloading will be attempted on the >>>> next request. But if preloading code then it will fail and merely be >>>> logged. If linger-timeout is specified to a non zero value, with the value >>>> being seconds, then the daemon process will instead be shutdown and >>>> restarted to try and allow a successful reloading of the code to occur if >>>> it was a transient issue. To avoid a fork bomb if a persistent issue, a >>>> delay will be introduced based on the value of the linger-timeout option. >>>> >>>> >>>> How does that all sound, if it makes sense that is. :-) >>>> >>>> >>>> >>>> That sounds absolutely great! How would I get on the notification cc: >>>> of the ticket or whatever so I'd be informed of progress on that? >>>> >>>> >>>> These days my turn around time is pretty quick so long as I am happy >>>> and know what to change and how. So I just need to think a bit more about >>>> it and gets some day job stuff out of the way before I can do something. >>>> >>>> So don't be surprised if you simply get a reply to this email within a >>>> week pointing at a development version to try. >>>> >>>> >>>> Well tons of thanks again. >>>> >>>> >>>> Graham >>>> >>>> Graham >>>> >>>> >>>> >>>> On 17/01/2015, at 12:27 AM, Kent <[email protected]> wrote: >>>> >>>> Thanks again. Yes, I did take our current version from the repo >>>> because you hadn't released the SIGUSR1 bit yet... I should upgrade now. >>>> >>>> As for the very long graceful-timeout, I was skirting around that >>>> solution because I like where the setting is currently for SIGUSR1. I >>>> would like to ask, "Is there a way to indicate a different graceful-timeout >>>> for handling SIGUSR1 vs. maximum-requests?" but I already have the >>>> answer from the release notes: "No." >>>> >>>> I don't know if you can see the value in distinguishing the two, but >>>> maximum-requests >>>> is sort of a "standard operating mode," so it might make sense for a >>>> modwsgi user to want a higher, very gentle mode of operation there, whereas >>>> SIGUSR1, while beautifully more graceful than SIGKILL, still "means >>>> business," so the same user may want a shorter responsive timeout there >>>> (while still allowing a decent chunk of time for being graceful to running >>>> requests). That is the case for me at least. Any chance you'd entertain >>>> that as a feature request? >>>> >>>> Even if not, you've been extremely helpful, thank you! And thanks for >>>> pointing me to the correct version of documentation: I thought I was >>>> reading current version. >>>> Kent >>>> >>>> P.S. I also have ideas for possible vertical URL partitioning, but >>>> unfortunately, our app has much cross-over by URL, so that's why I'm down >>>> this maximum-requests path... >>>> >>>> >>>> On Friday, January 16, 2015 at 4:54:50 AM UTC-5, Graham Dumpleton wrote: >>>> >>>> >>>> On 16/01/2015, at 7:28 AM, Kent <[email protected]> wrote: >>>> >>>> I'm running 4 (a very early version of it, possibly before you >>>> officially released it). We upgraded to take advantage of the >>>> amazingly-helpful SIGUSR1 signaling for graceful process restarting, >>>> which we use somewhat regularly to gracefully deploy software changes >>>> (minor ones which won't matter if 2 processes have different versions >>>> loaded) without disrupting users. Thanks a ton for that! >>>> >>>> >>>> SIGUSR1 support was released in version 4.1.0. >>>> >>>> http://modwsgi.readthedocs.org/en/master/release-notes/ >>>> version-4.1.0.html >>>> >>>> That same version has all the other stuff which was changed so long as >>>> using the actual 4.1.0 is being used and you aren't still using the repo >>>> from before the official release. >>>> >>>> If not sure, best just upgrading to latest version if you can. >>>> >>>> We are also multi-threading our processes (plural processes, plural >>>> threads). >>>> >>>> Some requests could be (validly) running for very long periods of time >>>> (database reporting, maybe even half hour, though that would be very >>>> extreme). >>>> >>>> Some processes (especially those generating .pdfs, for example), hog >>>> tons of RAM, as you know, so I'd like these to eventually check their RAM >>>> back in, so to speak, by utilizing either inactivity-timeout or >>>> maximum-requests, but always in a very gentle way, since, as I >>>> mentioned, some requests might be properly running, even though for many >>>> minutes. maximum-requests seems too brutal for my use-case since the >>>> threshold request sends the process down the >>>> graceful-timeout/shutdown-timeout, >>>> even if there are valid processes running and then SIGKILLs. My ideal >>>> vision of "maximum-requests," since it is *primarily for memory >>>> management,* is to be very gentle, sort of a "ok, now that I've hit my >>>> threshold, at my next earliest convenience, I should die, but only once all >>>> my current requests have ended of their own accord." >>>> >>>> >>>> That is where if you vertically partition those URLs out to a separate >>>> daemon process group, you can simply set a very hight graceful-timeout >>>> value. >>>> >>>> So relying on the feature: >>>> >>>> """ >>>> 2. Add a graceful-timeout option to WSGIDaemonProcess. This option is >>>> applied in a number of circumstances. >>>> >>>> When maximum-requests and this option are used together, when maximum >>>> requests is reached, rather than immediately shutdown, potentially >>>> interupting active requests if they don’t finished with shutdown timeout, >>>> can specify a separate graceful shutdown period. If the all requests are >>>> completed within this time frame then will shutdown immediately, otherwise >>>> normal forced shutdown kicks in. In some respects this is just allowing a >>>> separate shutdown timeout on cases where requests could be interrupted and >>>> could avoid it if possible. >>>> """ >>>> >>>> You could set: >>>> >>>> maximum-requests=20 graceful-timeout=600 >>>> >>>> So as soon as it hits 20 requests, it switches to a mode where it will >>>> when no requests, restart. You can set that timeout as high as you want, >>>> even hours, and it will not care. >>>> >>>> "inactivity-timeout" seems to function exactly as I want in that it >>>> seems like it won't ever kill a process with a thread with an active >>>> request (at least, I can't get it too even by adding a long import >>>> time;time.sleep(longtime)... it doesn't seem to die until the request >>>> is finished. But that's why the documentation made me nervous because it >>>> implies that it *could, *in fact, kill a proc with an active request: *"For >>>> the purposes of this option, being idle means no new requests being >>>> received, or no attempts by current requests to read request content or >>>> generate response content for the defined period." * >>>> >>>> >>>> The release notes for 4.1.0 say: >>>> >>>> """ >>>> 4. The inactivity-timeout option to WSGIDaemonProcess now only results >>>> in the daemon process being restarted after the idle timeout period where >>>> there are no active requests. Previously it would also interrupt a long >>>> running request. See the new request-timeout option for a way of >>>> interrupting long running, potentially blocked requests and restarting the >>>> process. >>>> """ >>>> >>>> I'd rather have a more gentle "maximum-requests" than >>>> "inactivity-timeout" because then, even on very heavy days (when RAM is >>>> most likely to choke), I could gracefully turn over these processes a >>>> couple times a day, which I couldn't do with "inactivity-timeout" on an >>>> extremely heavy day. >>>> >>>> Hope this makes sense. I'm really asking : >>>> >>>> 1. whether inactivity-timeout triggering will ever SIGKILL a >>>> process with an active request, as the docs intimate >>>> >>>> No from 4.1.0 onwards. >>>> >>>> >>>> 1. whether there is any way to get maximum-requests to behave more >>>> gently under all circumstances >>>> >>>> By setting a very very long graceful-timeout. >>>> >>>> >>>> 1. for your ideas/best advice >>>> >>>> Have a good read through the release notes for 4.1.0. >>>> >>>> Not necessarily useful in your case, but also look at request-timeout. >>>> It can act as a final fail safe for when things are stuck, but since it is >>>> more of a fail safe, it doesn't make use of graceful-timeout. >>>> >>>> Graham >>>> >>>> >>>> Thanks for your help! >>>> >>>> >>>> >>>> On Wednesday, January 14, 2015 at 9:48:02 PM UTC-5, Graham Dumpleton >>>> wrote: >>>> >>>> >>>> On 15/01/2015, at 8:32 AM, Kent <[email protected]> wrote: >>>> >>>> > Graham, the docs state: "For the purposes of this option, being idle >>>> means no new requests being received, or no attempts by current requests to >>>> read request content or generate response content for the defined period." >>>> >>>> > >>>> > This implies to me that a running request that is taking a long time >>>> could actually be killed as if it were idle (suppose it were fetching a >>>> very slow database query). Is this the case? >>>> >>>> This is the case for mod_wsgi prior to version 4.0. >>>> >>>> Things have changed in mod_wsgi 4.X. >>>> >>>> How long are your long running requests though? The inactivity-timeout >>>> was more about restarting infrequently used applications so that memory can >>>> be taken back. >>>> >>>> >>>> >>>> >>>> > Also, I'm looking for an ultra-conservative and graceful method of >>>> recycling memory. I've read your article on url partitioning, which was >>>> useful, but sooner or later, one must rely on either inactivity-timeout or >>>> maximum-requests, is that accurate? But both these will eventually, after >>>> graceful timeout/shutdown timeout, potentially kill active requests. It is >>>> valid for our app to handle long-running reports, so I was hoping for an >>>> ultra-safe mechanism. >>>> > Do you have any advice here? >>>> >>>> The options available in mod_wsgi 4.X are much better in this area than >>>> 3.X. The changes in 4.X aren't covered in main documentation though and are >>>> only described in the release notes where change was made. >>>> >>>> In 4.X the concept of an inactivity-timeout is now separate to the idea >>>> of a request-timeout. There is also a graceful-timeout that can be applied >>>> to maximum-requests and some other situations as well to allow requests to >>>> finish out properly before being more brutal. One can also signal the >>>> daemon processes to do a more graceful restart as well. >>>> >>>> You cannot totally avoid having to be brutal though and kill things >>>> else you don't have a fail safe for a stuck process where all request >>>> threads were blocked on back end services and were never going to recover. >>>> Use of multithreading in a process also complicates the implementation of >>>> request-timeout. >>>> >>>> Anyway, the big question is what version are you using? >>>> >>>> Graham >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "modwsgi" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at http://groups.google.com/group/modwsgi. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "modwsgi" group. >>>> To unsubscribe from this group and stop re >>>> >>>> ... >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "modwsgi" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/modwsgi. >>> For more options, visit https://groups.google.com/d/optout. >>> >>> >>> >>> -- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "modwsgi" group. >>> To unsubscribe from this topic, visit >>> https://groups.google.com/d/topic/modwsgi/84yzDAMFRsw/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/modwsgi. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "modwsgi" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/modwsgi. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > You received this message because you are subscribed to a topic in the > Google Groups "modwsgi" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/modwsgi/84yzDAMFRsw/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/modwsgi. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/modwsgi. For more options, visit https://groups.google.com/d/optout.
