Should now be fixed.

On 03/02/2015, at 8:50 PM, Graham Dumpleton <[email protected]> wrote:

> The application of the eviction timeout should not be fixed in develop branch.
> 
>     https://github.com/GrahamDumpleton/mod_wsgi/archive/develop.tar.gz
> 
> Graham
> 
> On 03/02/2015, at 5:02 PM, Graham Dumpleton <[email protected]> 
> wrote:
> 
>> 
>> 
>> On 3 February 2015 at 04:15, Kent Bower <[email protected]> wrote:
>> On Sun, Feb 1, 2015 at 7:08 PM, Graham Dumpleton 
>> <[email protected]> wrote:
>> Your Flask client doesn't need to know about Celery, as your web application 
>> accepts requests as normal and it is your Python code which would queue the 
>> job with Celery.
>> 
>> Now looking back, the only configuration I can find, but which I don't know 
>> if it is your actual production configuration is:
>> 
>>     WSGIDaemonProcess rarch processes=3 threads=2 inactivity-timeout=1800 
>> display-name=%{GROUP} graceful-timeout=140 eviction-timeout=60 
>> python-eggs=/home/rarch/tg2env/lib/python-egg-cache
>> 
>> Provided that you don't then start to have overall host memory issues, the 
>> simplest way around this whole issue is not to use a multithreaded process.
>>  
>> What you would do is vertically partition your URL name space so that just 
>> the URLs which do the long running report generation would be delegated to 
>> single threaded processes. Everything else would keep going to the 
>> multithread processes.
>> 
>>     WSGIDaemonProcess rarch processes=3 threads=2
>>     WSGIDaemonProvess rarch-long-running processes=6 threads=1 
>> maximum-requests=20
>> 
>>     WSGIProcessGroup rarch
>> 
>>     <Location /suburl/of/long/running/report/generator>
>>     WSGIProcessGroup rarch-long-running
>>     </Location>
>> 
>> You wouldn't even have to worry about the graceful-timeout on 
>> rarch-long-running as that is only relevant for maxiumum-requests where it 
>> is a multithreaded processes.
>> 
>> So what would happen is that when the request has finished, if 
>> maximum-requests is reached, the process would be restarted even before any 
>> new request was accepted by the process, so there is no chance of a new 
>> request being interrupted.
>> 
>> You could still set an eviction-timeout of some suitably large value to 
>> allow you to use SIGUSR1 to be sent to processes in that daemon process 
>> group to shut them down.
>> 
>> In this case, having eviction-timeout being able to be set independent of 
>> graceful-timeout (for maximum-requests), is probably useful and so I will 
>> retain the option.
>> 
>> So is there any reason you couldn't use a daemon process group with many 
>> single threaded process instead?
>> 
>> 
>> This is very good to know (that single threaded procs would behave more 
>> ideally in these circumstances).  The above was just my configuration for 
>> testing 'eviction-timeout'.  Our software generally runs with many more 
>> processes and threads, on servers with maybe 16 or 32 GB RAM.  And 
>> unfortunately, the RAM is the limiting resource here as our python app, 
>> built on turbo-gears, is a memory hog and we have yet to find the resources 
>> to dissect that.  I was aiming to head in the direction of URL partitioning, 
>> but there are big obstacles.  (Chiefly, RAM consumption would make threads=1 
>> and yet more processes very difficult unless we spend the huge effort in 
>> dissecting the app to locate and pull the many unused memory hogging 
>> libraries out.)
>> 
>> So, URL partitioning is sort of the ideal, distant solution, as well as a 
>> Celery-like polling solution, but out of my reach for now.
>> 
>> Have you ever run a test where you compare the whole memory usage of your 
>> application where all URLs are visited, to how much memory is used if only 
>> the URL which generates the long running report is visited?
>> 
>> In Django at least, a lot of stuff is lazily loaded only when a URL 
>> requiring it is first accessed. So even with a heavy code base, there can 
>> still be benefits in splitting out URLs to their own processes because the 
>> whole code base wouldn't be loaded due to the lazy loading.
>> 
>> So do you have any actual memory figures from doing that?
>> 
>> How many URLs are there that generates these reports vs those that don't, or 
>> is that all the whole application does?
>> 
>> Are your most frequently visited URLs those generating the reports or 
>> something else?
>>  
>> Another question for multithreaded graceful-timeout with maximum-requests:  
>> during a period of heavy traffic, it seems the graceful-timeout setting just 
>> pushes the real timeout until shutdown-timeout because, if heavy enough, 
>> you'll be getting requests during graceful-timeout.  That diminishes the 
>> fidelity of "graceful-timeout."  Do you see where I'm coming from (even if 
>> you're happy with the design and don't want to mess with it, which I'd 
>> understand)?
>> 
>> 
>> Ok, here is the log demonstrating the troubles I saw with eviction-timeout.  
>> For demonstration purposes, here is the simplified directive I'm using:
>> 
>> WSGIDaemonProcess rarch processes=1 threads=1 display-name=%{GROUP} 
>> graceful-timeout=140 eviction-timeout=60 
>> python-eggs=/home/rarch/tg2env/lib/python-egg-cache
>> 
>> Here is the log:
>> 
>> [Mon Feb 02 11:36:16 2015] [info] Init: Initializing (virtual) servers for 
>> SSL
>> [Mon Feb 02 11:36:16 2015] [info] Server: Apache/2.2.3, Interface: 
>> mod_ssl/2.2.3, Library: OpenSSL/0.9.8e-fips-rhel5
>> [Mon Feb 02 11:36:16 2015] [notice] Digest: generating secret for digest 
>> authentication ...
>> [Mon Feb 02 11:36:16 2015] [notice] Digest: done
>> [Mon Feb 02 11:36:16 2015] [info] APR LDAP: Built with OpenLDAP LDAP SDK
>> [Mon Feb 02 11:36:16 2015] [info] LDAP: SSL support available
>> [Mon Feb 02 11:36:16 2015] [info] Init: Seeding PRNG with 256 bytes of 
>> entropy
>> [Mon Feb 02 11:36:16 2015] [info] Init: Generating temporary RSA private 
>> keys (512/1024 bits)
>> [Mon Feb 02 11:36:16 2015] [info] Init: Generating temporary DH parameters 
>> (512/1024 bits)
>> [Mon Feb 02 11:36:16 2015] [info] Shared memory session cache initialised
>> [Mon Feb 02 11:36:16 2015] [info] Init: Initializing (virtual) servers for 
>> SSL
>> [Mon Feb 02 11:36:16 2015] [info] Server: Apache/2.2.3, Interface: 
>> mod_ssl/2.2.3, Library: OpenSSL/0.9.8e-fips-rhel5
>> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447): Starting process 
>> 'rarch' with uid=48, gid=48 and threads=1.
>> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447): Python home 
>> /home/rarch/tg2env.
>> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447): Initializing Python.
>> [Mon Feb 02 11:36:16 2015] [notice] Apache/2.2.3 (CentOS) configured -- 
>> resuming normal operations
>> [Mon Feb 02 11:36:16 2015] [info] Server built: Aug 30 2010 12:28:40
>> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447): Attach interpreter 
>> ''.
>> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447, process='rarch', 
>> application=''): Loading WSGI script 
>> '/home/rarch/trunk/src/appserver/wsgi-config/wsgi-deployment.py'.
>> [Mon Feb 02 11:39:13 2015] [info] mod_wsgi (pid=29447): Process eviction 
>> requested, waiting for requests to complete 'rarch'.
>> [Mon Feb 02 11:41:00 2015] [info] mod_wsgi (pid=29447): Daemon process 
>> graceful timer expired 'rarch'.
>> [Mon Feb 02 11:41:00 2015] [info] mod_wsgi (pid=29447): Shutdown requested 
>> 'rarch'.
>> [Mon Feb 02 11:41:05 2015] [info] mod_wsgi (pid=29447): Aborting process 
>> 'rarch'.
>> [Mon Feb 02 11:41:05 2015] [info] mod_wsgi (pid=29447): Exiting process 
>> 'rarch'.
>> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=29447): Process 'rarch' has 
>> died, deregister and restart it.
>> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=29447): Process 'rarch' has 
>> been deregistered and will no longer be monitored.
>> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331): Starting process 
>> 'rarch' with uid=48, gid=48 and threads=1.
>> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331): Python home 
>> /home/rarch/tg2env.
>> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331): Initializing Python.
>> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331): Attach interpreter 
>> ''.
>> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331, process='rarch', 
>> application=''): Loading WSGI script 
>> '/home/rarch/trunk/src/appserver/wsgi-config/wsgi-deployment.py'.
>> 
>> The process was signaled at 11:39:13 with eviction-timeout=60 but 11:40:13 
>> came and passed and nothing happened until 107 seconds passed, at which time 
>> graceful timer expired.
>> 
>> 
>> Next, I changed the parameters a little:
>> 
>> WSGIDaemonProcess rarch processes=1 threads=1 display-name=%{GROUP} 
>> eviction-timeout=30 graceful-timeout=240 
>> python-eggs=/home/rarch/tg2env/lib/python-egg-cache
>> 
>> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381): Starting process 
>> 'rarch' with uid=48, gid=48 and threads=1.
>> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381): Python home 
>> /home/rarch/tg2env.
>> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381): Initializing Python.
>> [Mon Feb 02 12:06:57 2015] [notice] Apache/2.2.3 (CentOS) configured -- 
>> resuming normal operations
>> [Mon Feb 02 12:06:57 2015] [info] Server built: Aug 30 2010 12:28:40
>> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381): Attach interpreter ''.
>> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381, process='rarch', 
>> application=''): Loading WSGI script 
>> '/home/rarch/trunk/src/appserver/wsgi-config/wsgi-deployment.py'.
>> [Mon Feb 02 12:07:19 2015] [info] mod_wsgi (pid=3381): Process eviction 
>> requested, waiting for requests to complete 'rarch'.
>> [Mon Feb 02 12:11:01 2015] [info] mod_wsgi (pid=3381): Daemon process 
>> graceful timer expired 'rarch'.
>> [Mon Feb 02 12:11:01 2015] [info] mod_wsgi (pid=3381): Shutdown requested 
>> 'rarch'.
>> [Mon Feb 02 12:11:06 2015] [info] mod_wsgi (pid=3381): Aborting process 
>> 'rarch'.
>> [Mon Feb 02 12:11:06 2015] [info] mod_wsgi (pid=3381): Exiting process 
>> 'rarch'.
>> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=3381): Process 'rarch' has 
>> died, deregister and restart it.
>> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=3381): Process 'rarch' has 
>> been deregistered and will no longer be monitored.
>> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028): Starting process 
>> 'rarch' with uid=48, gid=48 and threads=1.
>> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028): Python home 
>> /home/rarch/tg2env.
>> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028): Initializing Python.
>> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028): Attach interpreter ''.
>> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028, process='rarch', 
>> application=''): Loading WSGI script 
>> '/home/rarch/trunk/src/appserver/wsgi-config/wsgi-deployment.py'.
>> 
>> 
>> So, for me, eviction-timeout is apparently being ignored...
>> 
>> The background monitor thread which monitors for expiry wasn't taking into 
>> consideration the eviction timeout period being able to be less than the 
>> graceful timeout. I didn't see a problem as I was also setting request 
>> timeout, which causes the way the monitor thread works to be different, 
>> waking up every second regardless. I will work on a fix for that.
>> 
>> Another issue for consideration is if a graceful timeout is already in 
>> progress and a signal comes in for eviction, which timeout wins? Right now 
>> the eviction time will trump the graceful time if already set by maximum 
>> requests. The converse isn't true though in that if already in eviction 
>> cycle and maximum requests arrives, it wouldn't be trumped by graceful 
>> timeout. So eviction time had authority given that it was triggered by 
>> explicit user signal. It does mean that the signal could effectively extend 
>> what ever graceful time was in progress.
>> 
>> Graham
>>  
>> Thanks again for all your time and help,
>> Kent
>> 
>> 
>> 
>> Note that since only a sub set of URLs would go to the daemon process group, 
>> the memory usage profile will change as you aren't potentially loading the 
>> complete application code into those processes and only those needed for 
>> that URL and that report. So it could use up less memory than application as 
>> a whole, allowing you to have multiple single threaded processes with no 
>> issue.
>> 
>> Graham
>> 
>> On 31/01/2015, at 12:31 AM, Kent <[email protected]> wrote:
>> 
>>> Thanks for your reply and recommendations.  We're aware of the issues, but 
>>> I didn't give the full picture for brevity's sake.  The reports are user 
>>> generated reports.  Ultimately, the users know whether the reports should 
>>> return quickly (which many, many will), or whether they are long-running.  
>>> There is no way for the application to know that, so to avoid some sort of 
>>> polling (which we've done in the past and was a pain in the rear to users), 
>>> the design is to allow the user to decide whether to run the report in the 
>>> background or "foreground" via a check box.  Since most reports will return 
>>> in a matter of a minute or so, we wanted to avoid the pain of making them 
>>> poll, but I need to look at Celery.  However, I'm not comfortable punishing 
>>> users for accidentally choosing foreground on a long-running report.  That 
>>> is, not for an automatic turn-over mechanism like maximum-requests or 
>>> inactivity-timeout.  In my mind, those are inherently different than 
>>> something like a SIGUSR1 mechanism because the former are automatic.  
>>> 
>>> So, while admitting there are edge cases we are using that don't have a 
>>> perfect solution (or even admitting we need a better mechanism in that 
>>> case), it still seems to me mod_wsgi should be somewhat agnostic of design 
>>> choices.  In other words, when it comes to automatic turning over of 
>>> processes, it seems mod_wsgi shouldn't be involved with length of time 
>>> considerations, except to allow the user to specify timeouts.  See, the 
>>> long running reports are only one of my concerns: we also fight with 
>>> database locks sometimes, held by another application attached to the same 
>>> database and wholly out of our control.  Sometimes those locks can be held 
>>> for many minutes on a request that normally should complete within seconds. 
>>>  There too, it seems mod_wsgi should be very gentle in the automatic 
>>> turnover cases.
>>> 
>>> Thanks for pointing to Celery.  I really wonder whether I can get a message 
>>> broker to work with Adobe Flash, our current client, but I haven't looked 
>>> into this much yet.
>>> 
>>> Also, my apologies if you believe this to have been a waste of time on your 
>>> part.  You've been extremely helpful, though and I'm quite thankful for 
>>> your time!  I understand you not wanting to redesign the shutdown-timeout 
>>> thing and mess with what otherwise isn't broken.  Would you still like me 
>>> to post the apache debug logs regarding 'eviction-timeout' or have you 
>>> changed your mind about releasing that?  (In which case, extra apologies.)
>>> 
>>> Kent
>>> 
>>> 
>>> 
>>> 
>>> On Friday, January 30, 2015 at 6:34:28 AM UTC-5, Graham Dumpleton wrote:
>>> If you have web requests generating reports which take 40 minutes to run, 
>>> you are going the wrong way about it.
>>> 
>>> What would be regarded as best practice for long running requests is to use 
>>> a task queuing system to queue up the task to be run and run it in a 
>>> distinct set of processes to the web server. Your web request can then 
>>> return immediately, with some sort of polling system used as necessary to 
>>> check the progress of the task and allow the result to be downloaded when 
>>> complete. By using a separate system to run the tasks, it doesn't matter 
>>> whether the web server is restarted as the tasks will still run and after 
>>> the web server is restarted, a user can still check on progress of the 
>>> tasks and get back his response.
>>> 
>>> The most common such task execution system for doing this sort of thing is 
>>> Celery.
>>> 
>>> So it is because you aren't using the correct tool for the job here that 
>>> you are fighting against things like timeouts in the web server. No web 
>>> server is really a suitable environment to be used as an in process task 
>>> execution system. The web server should handle requests quickly and offload 
>>> longer processing tasks a separate task system which is purpose built for 
>>> handling the management of long running tasks.
>>> 
>>> I am not inclined to keep fiddling how the timeouts work now I understand 
>>> what you are trying to do. I am even questioning now whether I should have 
>>> introduced the separate eviction timeout I already did given that it is 
>>> turning out to be a questionable use case.
>>> 
>>> I would really recommend you look at re-architecting how you do things. I 
>>> don't think I would have any trouble finding others on the list who would 
>>> advise the same thing and who could also give you further advice on using 
>>> something like Celery instead for task execution.
>>> 
>>> Graham
>>> 
>>> On 29/01/2015, at 7:30 AM, Kent <[email protected]> wrote:
>>> 
>>> Ok, I plan to run those tests with debug and post, but please, in the 
>>> meantime:
>>> 
>>> For our app, not interrupting existing requests is a higher priority than 
>>> being able to accept new requests, particularly since we typically run many 
>>> wsgi processes, each with a handful of threads.  So, I'm not really 
>>> concerned about maintaining always available threads (statistically, I will 
>>> be fine... that isn't the issue for me).  
>>> 
>>> In these circumstances, it would be much better for all these triggering 
>>> events (SIGUSR1, maximum-requests, or inactivity-timeout, etc.) to 
>>> immediately stop accepting new requests and "concentrate" on shutting down. 
>>>  (Unless that means requests waiting in apache are terminated because they 
>>> were queued for this particular process, but I doubt apache has already 
>>> determined the request's process if none are available, has it?)  With high 
>>> graceful-timeout/eviction-timeout and low shutdown-timeout, I run a pretty 
>>> high risk of accepting a new request at the tail end of graceful-timeout or 
>>> eviction-timeout, only to have it basically doomed to ungraceful death 
>>> because many of our requests are long running (very often well over 5 or 10 
>>> sec).
>>> 
>>> I guess that's why, through experimentation with SIGUSR1 a few years back, 
>>> I ended up "graceful-timeout=5 shutdown-timeout=300" ... the opposite of 
>>> how it would default, because this works well when trying to signal these 
>>> to recycle themselves: they basically immediately stop accepting new 
>>> requests so your "guaranteed" graceful timeout is 300.  It seems I have no 
>>> way to "guarantee" a very large graceful timeout for each and every 
>>> request, even if affected by maximum-requests or inactivity-timeout, and 
>>> specify a different (lower) one for SIGUSR1 because the only truly 
>>> guaranteed lifetime in seconds is "shutdown-timeout," is that accurate?
>>> 
>>> The ideal for our app, which may accept certain request that run for 
>>> several minutes is this:
>>> if maximum-requests or inactivity-timeout is hit, stop taking new requests 
>>> immediately and shutdown as soon as possible, but give existing requests 
>>> basically all the time they need to finish (say, up to 40 minutes (for 
>>> long-running db reports)).
>>> if SIGUSR1, stop taking new requests immediately and shutdown as soon as 
>>> possible, but give existing requests a really good chance to complete, 
>>> maybe 3-5 minutes, but not the 40 minutes, because this is slightly more 
>>> urgent (was triggered manually and a user is monitoring/waiting for 
>>> turnover and wants new code in place)
>>> I don't think I can accomplish the above if I understand the design 
>>> correctly because a request may have been accepted at the tail end of 
>>> graceful-timeout/eviction-timeout and so is only guaranteed a lifetime of 
>>> shutdown-timeout, regardless of what the trigger was (SIGUSR1 vs. 
>>> automatic).
>>> 
>>> Is my understanding of this accurate?
>>> 
>>> 
>>> 
>>> On Tuesday, January 27, 2015 at 9:48:01 PM UTC-5, Graham Dumpleton wrote:
>>> Can you ensure that LogLevel is set to at least info and provide what 
>>> messages are in the Apache error log file
>>> 
>>> If I use:
>>> 
>>>     $ mod_wsgi-express start-server hack/sleep.wsg--log-level=debug 
>>> --verbose-debugging --eviction-timeout 30 --graceful-timeout 60
>>> 
>>> which is equivalent to:
>>> 
>>>     WSGIDaemonProcess … graceful-timeout=60 eviction-timeout=30
>>> 
>>> and fire a request against application that sleeps a long time I see in the 
>>> Apache error logs at the time of the signal:
>>> 
>>> [Wed Jan 28 13:34:34 2015] [info] mod_wsgi (pid=29639): Process eviction 
>>> requested, waiting for requests to complete 'localhost:8000'.
>>> 
>>> At the end of the 30 seconds given by the eviction timeout I see:
>>> 
>>> [Wed Jan 28 13:35:05 2015] [info] mod_wsgi (pid=29639): Daemon process 
>>> graceful timer expired 'localhost:8000'.
>>> [Wed Jan 28 13:35:05 2015] [info] mod_wsgi (pid=29639): Shutdown requested 
>>> 'localhost:8000'.
>>> 
>>> Up till that point the process would still have been accepting new requests 
>>> and was waiting for point that there was no active requests to allow it to 
>>> shutdown.
>>> 
>>> As the timeout tripped at 30 seconds, it then instead goes into the more 
>>> brutal shutdown process. No new requests are accepted from this point.
>>> 
>>> For my setup the shutdown-timeout defaults to 5 seconds and because the 
>>> request still hadn't completed within 5 seconds, then the process is exited 
>>> anyway and allowed to shutdown.
>>> 
>>> [Wed Jan 28 13:35:10 2015] [info] mod_wsgi (pid=29639): Aborting process 
>>> 'localhost:8000'.
>>> [Wed Jan 28 13:35:10 2015] [info] mod_wsgi (pid=29639): Exiting process 
>>> 'localhost:8000'.
>>> 
>>> Because the application never returned a response, that results in the 
>>> Apache child worker who was trying to talk to the daemon process seeing a 
>>> truncated response.
>>> 
>>> [Wed Jan 28 13:35:10 2015] [error] [client 127.0.0.1] Truncated or 
>>> oversized response headers received from daemon process 'localhost:8000': 
>>> /tmp/mod_wsgi-localhost:8000:502/htdocs/
>>> 
>>> When the Apache parent process notices the daemon process has died, it 
>>> cleans up and starts a new one.
>>> 
>>> [Wed Jan 28 13:35:11 2015] [info] mod_wsgi (pid=29639): Process 
>>> 'localhost:8000' has died, deregister and restart it.
>>> [Wed Jan 28 13:35:11 2015] [info] mod_wsgi (pid=29639): Process 
>>> 'localhost:8000' has been deregistered and will no longer be monitored.
>>> [Wed Jan 28 13:35:11 2015] [info] mod_wsgi (pid=29764): Starting process 
>>> 'localhost:8000' with threads=5.
>>> 
>>> So the shutdown phase specified by shutdown-timeout is subsequent to 
>>> eviction-timeout. It is one last chance to shutdown during a time that no 
>>> new requests are accepted in case it is the constant flow of requests that 
>>> is preventing it, rather than one long running request.
>>> 
>>> The shutdown-timeout should always be kept quite short because no new 
>>> requests will be accepted during that time. So changing it from the default 
>>> isn't something one would normally do.
>>> 
>>> Graham
>>> 
>>> On 28/01/2015, at 3:02 AM, Kent <[email protected]> wrote:
>>> 
>>> Let me be more specific.  I'm having a hard time getting this to test as I 
>>> expected.  Here is my WSGIDaemonProcess directive:
>>> 
>>> WSGIDaemonProcess rarch processes=3 threads=2 inactivity-timeout=1800 
>>> display-name=%{GROUP} graceful-timeout=140 eviction-timeout=60 
>>> python-eggs=/home/rarch/tg2env/lib/python-egg-cache
>>> 
>>> I put a 120 sec sleep in one of the processes' requests and then SIGUSR1 
>>> (Linux) all three processes.  The two inactive ones immediately restart, as 
>>> I expect.  However, the 3rd (sleeping) one is allowed to run past the 60 
>>> second eviction_timeout and runs straight to the graceful_timeout before it 
>>> is terminated.  Shouldn't it have been killed at 60 sec?
>>> 
>>> (And then, as my previous question, how does shutdown-timeout factor into 
>>> all this?)
>>> 
>>> Thanks again!
>>> Kent
>>> 
>>> 
>>> 
>>> On Tuesday, January 27, 2015 at 9:34:12 AM UTC-5, Kent wrote:
>>> I think I might understand the difference between 'graceful-timeout' and 
>>> 'shutdown-timeout', but can you please just clarify the difference?  Are 
>>> they additive?
>>> 
>>> Also, will 'eviction-timeout' interact with either of those, or simply 
>>> override them?
>>> 
>>> Thanks,
>>> Kent
>>> 
>>> On Monday, January 26, 2015 at 12:44:13 AM UTC-5, Graham Dumpleton wrote:
>>> Want to give:
>>> 
>>>     https://github.com/GrahamDumpleton/mod_wsgi/archive/develop.tar.gz
>>> 
>>> a go?
>>> 
>>> The WSGIDaemonProcess directive is 'eviction-timeout'. For mod_wsgi-express 
>>> the command line option is '--eviction-timeout'.
>>> 
>>> So the terminology am using around this is that sending a signal is like 
>>> forcibly evicting the WSGI application, allow the process to be restarted. 
>>> At least this way can have an option name that is distinct enough from 
>>> generic 'restart' so as not to be confusing.
>>> 
>>> Graham
>>> 
>>> On 21/01/2015, at 11:15 PM, Kent <[email protected]> wrote:
>>> 
>>> 
>>> On Tuesday, January 20, 2015 at 5:53:26 PM UTC-5, Graham Dumpleton wrote:
>>> 
>>> On 20/01/2015, at 11:50 PM, Kent <[email protected]> wrote:
>>> 
>>> On Sunday, January 18, 2015 at 12:43:08 AM UTC-5, Graham Dumpleton wrote:
>>> There are a few possibilities here of how this could be enhanced/changed.
>>> 
>>> The problem with maximum-requests is that it can be dangerous. People can 
>>> set it too low and when their site gets a big spike of traffic then the 
>>> processes can be restarted too quickly only adding to the load of the site 
>>> and causing things to slow down and hamper their ability to handle the 
>>> spike. This is where setting a longer amount of time for graceful-timeout 
>>> helps because you can set it to be quite large. The use of maximum-requests 
>>> can still be like using a hammer though, and one which can be applied 
>>> unpredictably.
>>> 
>>> Yes, I can see that. (It may be overkill, but you could default a separate 
>>> minimum-lifetime parameter so only users who specifically mess with that as 
>>> well as maximum-requests shoot themselves in the foot, but it is starting 
>>> to get confusing with all the different timeouts, I'll agree there...)
>>>  
>>> 
>>> The minimum-lifetime option is an interesting idea. It may have to do 
>>> nothing by default to avoid conflicts with existing expected behaviour.
>>> 
>>> 
>>> The maximum-requests option also doesn't help in the case where you are 
>>> running background threads which do stuff and it is them and not the number 
>>> of requests coming in that dictate things like memory growth that you want 
>>> to counter.
>>> 
>>> 
>>> True, but solving with maximum lifetime... well, actually, solving memory 
>>> problems with any of these mechanisms isn't measuring the heart of the 
>>> problem, which is RAM.  I imagine there isn't a good way to measure RAM or 
>>> you would have added that option by now.  Seems what we are truly after for 
>>> the majority of these isn't how many requests or how log its been up, etc, 
>>> but how much RAM it is taking (or perhaps, optionally, average RAM per 
>>> thread, instead).  If my process exceeds consuming 1.5GB, then trigger a 
>>> graceful restart at the next appropriate convenience, being gentle to 
>>> existing requests.  That may be arguably the most useful parameter.
>>> 
>>> 
>>> The problem with calculating memory is that there isn't one cross platform 
>>> portable way of doing it. On Linux you have to dive into the /proc file 
>>> system. On MacOS X you can use C API calls. On Solaris I think you again 
>>> need to dive into a /proc file system but it obviously has a different file 
>>> structure for getting details out compared to Linux. Adding such cross 
>>> platform stuff in gets a bit messy.
>>> 
>>> What I was moving towards as an extension of the monitoring stuff I am 
>>> doing for mod_wsgi was to have a special daemon process you can setup which 
>>> has access to some sort of management API. You could then create your own 
>>> Python script that runs in that and which using the management API can get 
>>> daemon process pids and then use Python psutil to get memory usage on 
>>> periodic basis and then you decide if process should be restarted and send 
>>> it a signal to stop, or management API provided which allows you to notify 
>>> in some way, maybe by signal, or maybe using shared memory flag, that 
>>> daemon process should shut down.
>>> 
>>> 
>>> I figured there was something making that a pain...
>>>  
>>> So the other option I have contemplated adding a number of times is is one 
>>> to periodically restart the process. The way this would work is that a 
>>> process restart would be done periodically based on what time was 
>>> specified. You could therefore say the restart interval was 3600 and it 
>>> would restart the process once an hour.
>>> 
>>> The start of the time period for this would either be, when the process was 
>>> created, if any Python code or a WSGI script was preloaded at process start 
>>> time. Or, it would be from when the first request arrived if the WSGi 
>>> application was lazily loaded. This restart-interval could be tied to the 
>>> graceful-timeout option so that you can set and extended period if you want 
>>> to try and ensure that requests are not interrupted.
>>> 
>>> We just wouldn't want it to die having never even served a single request, 
>>> so my vote would be against the birth of the process as the beginning point 
>>> (and, rather, at first request).
>>> 
>>> 
>>> It would effectively be from first request if lazily loaded. If preloaded 
>>> though, as background threads could be created which do stuff and consume 
>>> memory over time, would then be from when process started, ie., when Python 
>>> code was preloaded.
>>> 
>>> 
>>> But then for preloaded, processes life-cycle themselves for no reason 
>>> throughout inactive periods like maybe overnight.  That's not the end of 
>>> the world, but I wonder if we're catering to the wrong design. (These are, 
>>> after all, webserver processes, so it seems a fair assumption that they 
>>> exist primarily to handle requests, else why even run under apache?)  My 
>>> vote, for what it's worth, would still be timed from first request, but I 
>>> probably won't use that particular option.  Either way would be useful for 
>>> some I'm sure.
>>>  
>>> 
>>> Now we have the ability to sent the process graceful restart signal 
>>> (usually SIGUSR1), to force an individual process to restart.
>>> 
>>> Right now this is tied to the graceful-timeout duration as well, which as 
>>> you point out, would perhaps be better off having its own time duration for 
>>> the notional grace period.
>>> 
>>> Using the name restart-timeout for this could be confusing if I have a 
>>> restart interval option.
>>> 
>>> 
>>> In my opinion, SIGUSR1 is different from the automatic parameters because 
>>> it was (most likely) triggered by user intervention, so that one should 
>>> ideally have its own parameter.  If that is the case and this parameter 
>>> becomes dedicated to SIGUSR1, then the least ambiguous name I can think of 
>>> is sigusr1-timeout.
>>>  
>>> 
>>> Except that it isn't guaranteed to be called SIGUSR1. Technically it could 
>>> be a different signal dependent on platform that Apache runs as. But then, 
>>> as far as I know all UNIX systems do use SIGUSR1.
>>> 
>>> 
>>> In any case, they are "signals": you like signal-timeout? (Also could be 
>>> taken ambiguously, but maybe less so than restart-timeout?)
>>>  
>>> I also have another type of process restart I am trying to work out how to 
>>> accommodate and the naming of options again complicates the problem. In 
>>> this case we want to introduce an artificial restart delay.
>>> 
>>> This would be an option to combat the problem which is being caused by 
>>> Django 1.7 in that WSGI script file loading for Django isn't stateless. If 
>>> a transient problem occurs, such as the database not being ready, the 
>>> loading of the WSGI script file can fail. On the next request an attempt is 
>>> made to load it again but now Django kicks a stink because it was half way 
>>> setting things up last time when it failed and the setup code cannot be run 
>>> a second time. The result is that the process then keeps failing.
>>> 
>>> The idea of the restart delay option therefore is to allow you to set it to 
>>> number of seconds, normally just 1. If set like that, if a WSGI script file 
>>> import fails, it will effectively block for the delay specified and when 
>>> over it will kill the process so the whole process is thrown away and the 
>>> WSGI script file can be reloaded in a fresh process. This gets rid of the 
>>> problem of Django initialisation not being able to be retried.
>>> 
>>> 
>>> (We are using turbogears... I don't think I've seen something like that 
>>> happen, but rarely have seen start up anomalies.)
>>>  
>>> A delay is needed to avoid an effective fork bomb, where a WSGI script file 
>>> not loading with high request throughput would cause a constant cycle of 
>>> processes dying and being replaced. It is possible it wouldn't be as bad as 
>>> I think as Apache only checks for dead processes to replace once a second, 
>>> but still prefer my own failsafe in case that changes.
>>> 
>>> I am therefore totally fine with a separate graceful time period for when 
>>> SIGUSR1 is used, I just need to juggle these different features and come up 
>>> with an option naming scheme that make sense.
>>> 
>>> How about then that I add the following new options:
>>> 
>>>     maximum-lifetime - Similar to maximum-requests in that it will cause 
>>> the processes to be shutdown and restarted, but in this case it will occur 
>>> based on the time period given as argument, measured from the first request 
>>> or when the WSGI script file or any other Python code was preloaded, that 
>>> is, in the latter case when the process was started.
>>> 
>>>     restart-timeout - Specifies a separate grace period for when the 
>>> process is being forcibly restarted using the graceful restart signal. If 
>>> restart-timeout is not specified and graceful-timeout is specified, then 
>>> the value of graceful-timeout is used. If neither are specified, then the 
>>> restart signal will be have similar to the process being sent a SIGINT.
>>> 
>>>     linger-timeout - When a WSGI script file, of other Python code is being 
>>> imported by mod_wsgi directly, if that fails the default is that the error 
>>> is ignored. For a WSGI script file reloading will be attempted on the next 
>>> request. But if preloading code then it will fail and merely be logged. If 
>>> linger-timeout is specified to a non zero value, with the value being 
>>> seconds, then the daemon process will instead be shutdown and restarted to 
>>> try and allow a successful reloading of the code to occur if it was a 
>>> transient issue. To avoid a fork bomb if a persistent issue, a delay will 
>>> be introduced based on the value of the linger-timeout option.
>>>  
>>> How does that all sound, if it makes sense that is. :-)
>>> 
>>> 
>>> 
>>> That sounds absolutely great!  How would I get on the notification cc: of 
>>> the ticket or whatever so I'd be informed of progress on that?
>>> 
>>> These days my turn around time is pretty quick so long as I am happy and 
>>> know what to change and how. So I just need to think a bit more about it 
>>> and gets some day job stuff out of the way before I can do something.
>>> 
>>> So don't be surprised if you simply get a reply to this email within a week 
>>> pointing at a development version to try.
>>> 
>>> 
>>> Well tons of thanks again.
>>>  
>>> Graham
>>> 
>>> Graham
>>> 
>>>  
>>> On 17/01/2015, at 12:27 AM, Kent <[email protected]> wrote:
>>> 
>>> Thanks again.  Yes, I did take our current version from the repo because 
>>> you hadn't released the SIGUSR1 bit yet...  I should upgrade now.
>>> 
>>> As for the very long graceful-timeout, I was skirting around that solution 
>>> because I like where the setting is currently for SIGUSR1.  I would like to 
>>> ask, "Is there a way to indicate a different graceful-timeout for handling 
>>> SIGUSR1 vs. maximum-requests?" but I already have the answer from the 
>>> release notes: "No."
>>> 
>>> I don't know if you can see the value in distinguishing the two, but 
>>> maximum-requests is sort of a "standard operating mode," so it might make 
>>> sense for a modwsgi user to want a higher, very gentle mode of operation 
>>> there, whereas SIGUSR1, while beautifully more graceful than SIGKILL, still 
>>> "means business," so the same user may want a shorter responsive timeout 
>>> there (while still allowing a decent chunk of time for being graceful to 
>>> running requests).   That is the case for me at least.  Any chance you'd 
>>> entertain that as a feature request?
>>> 
>>> Even if not, you've been extremely helpful, thank you!  And thanks for 
>>> pointing me to the correct version of documentation: I thought I was 
>>> reading current version.
>>> Kent
>>> 
>>> P.S. I also have ideas for possible vertical URL partitioning, but 
>>> unfortunately, our app has much cross-over by URL, so that's why I'm down 
>>> this maximum-requests path...
>>> 
>>> 
>>> On Friday, January 16, 2015 at 4:54:50 AM UTC-5, Graham Dumpleton wrote:
>>> 
>>> On 16/01/2015, at 7:28 AM, Kent <[email protected]> wrote:
>>> 
>>> I'm running 4 (a very early version of it, possibly before you officially 
>>> released it).   We upgraded to take advantage of the amazingly-helpful 
>>> SIGUSR1 signaling for graceful process restarting, which we use somewhat 
>>> regularly to gracefully deploy software changes (minor ones which won't 
>>> matter if 2 processes have different versions loaded) without disrupting 
>>> users.  Thanks a ton for that!
>>> 
>>> SIGUSR1 support was released in version 4.1.0.
>>> 
>>>     
>>> http://modwsgi.readthedocs.org/en/master/release-notes/version-4.1.0.html
>>> 
>>> That same version has all the other stuff which was changed so long as 
>>> using the actual 4.1.0 is being used and you aren't still using the repo 
>>> from before the official release.
>>> 
>>> If not sure, best just upgrading to latest version if you can.
>>> 
>>> We are also multi-threading our processes (plural processes, plural 
>>> threads).
>>> 
>>> Some requests could be (validly) running for very long periods of time 
>>> (database reporting, maybe even half hour, though that would be very 
>>> extreme).
>>> 
>>> Some processes (especially those generating .pdfs, for example), hog tons 
>>> of RAM, as you know, so I'd like these to eventually check their RAM back 
>>> in, so to speak, by utilizing either inactivity-timeout or 
>>> maximum-requests, but always in a very gentle way, since, as I mentioned, 
>>> some requests might be properly running, even though for many minutes.  
>>> maximum-requests seems too brutal for my use-case since the threshold 
>>> request sends the process down the graceful-timeout/shutdown-timeout, even 
>>> if there are valid processes running and then SIGKILLs.  My ideal vision of 
>>> "maximum-requests," since it is primarily for memory management, is to be 
>>> very gentle, sort of a "ok, now that I've hit my threshold, at my next 
>>> earliest convenience, I should die, but only once all my current requests 
>>> have ended of their own accord."
>>> 
>>> That is where if you vertically partition those URLs out to a separate 
>>> daemon process group, you can simply set a very hight graceful-timeout 
>>> value.
>>> 
>>> So relying on the feature:
>>> 
>>> """
>>> 2. Add a graceful-timeout option to WSGIDaemonProcess. This option is 
>>> applied in a number of circumstances.
>>> 
>>> When maximum-requests and this option are used together, when maximum 
>>> requests is reached, rather than immediately shutdown, potentially 
>>> interupting active requests if they don’t finished with shutdown timeout, 
>>> can specify a separate graceful shutdown period. If the all requests are 
>>> completed within this time frame then will shutdown immediately, otherwise 
>>> normal forced shutdown kicks in. In some respects this is just allowing a 
>>> separate shutdown timeout on cases where requests could be interrupted and 
>>> could avoid it if possible.
>>> """
>>> 
>>> You could set:
>>> 
>>>     maximum-requests=20 graceful-timeout=600
>>> 
>>> So as soon as it hits 20 requests, it switches to a mode where it will when 
>>> no requests, restart. You can set that timeout as high as you want, even 
>>> hours, and it will not care.
>>> 
>>> "inactivity-timeout" seems to function exactly as I want in that it seems 
>>> like it won't ever kill a process with a thread with an active request (at 
>>> least, I can't get it too even by adding a long import 
>>> time;time.sleep(longtime)... it doesn't seem to die until the request is 
>>> finished.  But that's why the documentation made me nervous because it 
>>> implies that it could, in fact, kill a proc with an active request: "For 
>>> the purposes of this option, being idle means no new requests being 
>>> received, or no attempts by current requests to read request content or 
>>> generate response content for the defined period."   
>>> 
>>> The release notes for 4.1.0 say:
>>> 
>>> """
>>> 4. The inactivity-timeout option to WSGIDaemonProcess now only results in 
>>> the daemon process being restarted after the idle timeout period where 
>>> there are no active requests. Previously it would also interrupt a long 
>>> running request. See the new request-timeout option for a way of 
>>> interrupting long running, potentially blocked requests and restarting the 
>>> process.
>>> """
>>> 
>>> I'd rather have a more gentle "maximum-requests" than "inactivity-timeout" 
>>> because then, even on very heavy days (when RAM is most likely to choke), I 
>>> could gracefully turn over these processes a couple times a day, which I 
>>> couldn't do with "inactivity-timeout" on an extremely heavy day.
>>> 
>>> Hope this makes sense.  I'm really asking :
>>> whether inactivity-timeout triggering will ever SIGKILL a process with an 
>>> active request, as the docs intimate
>>> No from 4.1.0 onwards.
>>> whether there is any way to get maximum-requests to behave more gently 
>>> under all circumstances
>>> By setting a very very long graceful-timeout.
>>> for your ideas/best advice
>>> Have a good read through the release notes for 4.1.0.
>>> 
>>> Not necessarily useful in your case, but also look at request-timeout. It 
>>> can act as a final fail safe for when things are stuck, but since it is 
>>> more of a fail safe, it doesn't make use of graceful-timeout.
>>> 
>>> Graham
>>> 
>>> 
>>> Thanks for your help!
>>> 
>>> 
>>> 
>>> On Wednesday, January 14, 2015 at 9:48:02 PM UTC-5, Graham Dumpleton wrote:
>>> 
>>> On 15/01/2015, at 8:32 AM, Kent <[email protected]> wrote: 
>>> 
>>> > Graham, the docs state: "For the purposes of this option, being idle 
>>> > means no new requests being received, or no attempts by current requests 
>>> > to read request content or generate response content for the defined 
>>> > period."   
>>> > 
>>> > This implies to me that a running request that is taking a long time 
>>> > could actually be killed as if it were idle (suppose it were fetching a 
>>> > very slow database query).  Is this the case? 
>>> 
>>> This is the case for mod_wsgi prior to version 4.0. 
>>> 
>>> Things have changed in mod_wsgi 4.X. 
>>> 
>>> How long are your long running requests though? The inactivity-timeout was 
>>> more about restarting infrequently used applications so that memory can be 
>>> taken back. 
>>>  
>>> 
>>> > Also, I'm looking for an ultra-conservative and graceful method of 
>>> > recycling memory. I've read your article on url partitioning, which was 
>>> > useful, but sooner or later, one must rely on either inactivity-timeout 
>>> > or maximum-requests, is that accurate?  But both these will eventually, 
>>> > after graceful timeout/shutdown timeout, potentially kill active 
>>> > requests.  It is valid for our app to handle long-running reports, so I 
>>> > was hoping for an ultra-safe mechanism. 
>>> > Do you have any advice here? 
>>> 
>>> The options available in mod_wsgi 4.X are much better in this area than 
>>> 3.X. The changes in 4.X aren't covered in main documentation though and are 
>>> only described in the release notes where change was made. 
>>> 
>>> In 4.X the concept of an inactivity-timeout is now separate to the idea of 
>>> a request-timeout. There is also a graceful-timeout that can be applied to 
>>> maximum-requests and some other situations as well to allow requests to 
>>> finish out properly before being more brutal. One can also signal the 
>>> daemon processes to do a more graceful restart as well. 
>>> 
>>> You cannot totally avoid having to be brutal though and kill things else 
>>> you don't have a fail safe for a stuck process where all request threads 
>>> were blocked on back end services and were never going to recover. Use of 
>>> multithreading in a process also complicates the implementation of 
>>> request-timeout. 
>>> 
>>> Anyway, the big question is what version are you using? 
>>> 
>>> Graham 
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "modwsgi" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/modwsgi.
>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "modwsgi" group.
>>> To unsubscribe from this group and stop re
>>> ...
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "modwsgi" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/modwsgi.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "modwsgi" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/modwsgi/84yzDAMFRsw/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/modwsgi.
>> For more options, visit https://groups.google.com/d/optout.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/modwsgi.
>> For more options, visit https://groups.google.com/d/optout.
>> 
> 

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to