Re: [modwsgi] inactivity-timeout

Graham Dumpleton Sun, 25 Jan 2015 21:44:29 -0800

Want to give:

    https://github.com/GrahamDumpleton/mod_wsgi/archive/develop.tar.gz


a go?

The WSGIDaemonProcess directive is 'eviction-timeout'. For mod_wsgi-express the 
command line option is '--eviction-timeout'.

So the terminology am using around this is that sending a signal is like 
forcibly evicting the WSGI application, allow the process to be restarted. At 
least this way can have an option name that is distinct enough from generic 
'restart' so as not to be confusing.

Graham

On 21/01/2015, at 11:15 PM, Kent <[email protected]> wrote:

> 
> On Tuesday, January 20, 2015 at 5:53:26 PM UTC-5, Graham Dumpleton wrote:
> 
> On 20/01/2015, at 11:50 PM, Kent <[email protected]> wrote:
> 
>> On Sunday, January 18, 2015 at 12:43:08 AM UTC-5, Graham Dumpleton wrote:
>> There are a few possibilities here of how this could be enhanced/changed.
>> 
>> The problem with maximum-requests is that it can be dangerous. People can 
>> set it too low and when their site gets a big spike of traffic then the 
>> processes can be restarted too quickly only adding to the load of the site 
>> and causing things to slow down and hamper their ability to handle the 
>> spike. This is where setting a longer amount of time for graceful-timeout 
>> helps because you can set it to be quite large. The use of maximum-requests 
>> can still be like using a hammer though, and one which can be applied 
>> unpredictably.
>> 
>> Yes, I can see that. (It may be overkill, but you could default a separate 
>> minimum-lifetime parameter so only users who specifically mess with that as 
>> well as maximum-requests shoot themselves in the foot, but it is starting to 
>> get confusing with all the different timeouts, I'll agree there...)
>>  
> 
> The minimum-lifetime option is an interesting idea. It may have to do nothing 
> by default to avoid conflicts with existing expected behaviour.
> 
>> 
>> The maximum-requests option also doesn't help in the case where you are 
>> running background threads which do stuff and it is them and not the number 
>> of requests coming in that dictate things like memory growth that you want 
>> to counter.
>> 
>> 
>> True, but solving with maximum lifetime... well, actually, solving memory 
>> problems with any of these mechanisms isn't measuring the heart of the 
>> problem, which is RAM.  I imagine there isn't a good way to measure RAM or 
>> you would have added that option by now.  Seems what we are truly after for 
>> the majority of these isn't how many requests or how log its been up, etc, 
>> but how much RAM it is taking (or perhaps, optionally, average RAM per 
>> thread, instead).  If my process exceeds consuming 1.5GB, then trigger a 
>> graceful restart at the next appropriate convenience, being gentle to 
>> existing requests.  That may be arguably the most useful parameter.
>> 
> 
> The problem with calculating memory is that there isn't one cross platform 
> portable way of doing it. On Linux you have to dive into the /proc file 
> system. On MacOS X you can use C API calls. On Solaris I think you again need 
> to dive into a /proc file system but it obviously has a different file 
> structure for getting details out compared to Linux. Adding such cross 
> platform stuff in gets a bit messy.
> 
> What I was moving towards as an extension of the monitoring stuff I am doing 
> for mod_wsgi was to have a special daemon process you can setup which has 
> access to some sort of management API. You could then create your own Python 
> script that runs in that and which using the management API can get daemon 
> process pids and then use Python psutil to get memory usage on periodic basis 
> and then you decide if process should be restarted and send it a signal to 
> stop, or management API provided which allows you to notify in some way, 
> maybe by signal, or maybe using shared memory flag, that daemon process 
> should shut down.
> 
> 
> I figured there was something making that a pain...
>  
>> So the other option I have contemplated adding a number of times is is one 
>> to periodically restart the process. The way this would work is that a 
>> process restart would be done periodically based on what time was specified. 
>> You could therefore say the restart interval was 3600 and it would restart 
>> the process once an hour.
>> 
>> The start of the time period for this would either be, when the process was 
>> created, if any Python code or a WSGI script was preloaded at process start 
>> time. Or, it would be from when the first request arrived if the WSGi 
>> application was lazily loaded. This restart-interval could be tied to the 
>> graceful-timeout option so that you can set and extended period if you want 
>> to try and ensure that requests are not interrupted.
>> 
>> We just wouldn't want it to die having never even served a single request, 
>> so my vote would be against the birth of the process as the beginning point 
>> (and, rather, at first request).
>> 
> 
> It would effectively be from first request if lazily loaded. If preloaded 
> though, as background threads could be created which do stuff and consume 
> memory over time, would then be from when process started, ie., when Python 
> code was preloaded.
> 
> 
> But then for preloaded, processes life-cycle themselves for no reason 
> throughout inactive periods like maybe overnight.  That's not the end of the 
> world, but I wonder if we're catering to the wrong design. (These are, after 
> all, webserver processes, so it seems a fair assumption that they exist 
> primarily to handle requests, else why even run under apache?)  My vote, for 
> what it's worth, would still be timed from first request, but I probably 
> won't use that particular option.  Either way would be useful for some I'm 
> sure.
>  
>> 
>> Now we have the ability to sent the process graceful restart signal (usually 
>> SIGUSR1), to force an individual process to restart.
>> 
>> Right now this is tied to the graceful-timeout duration as well, which as 
>> you point out, would perhaps be better off having its own time duration for 
>> the notional grace period.
>> 
>> Using the name restart-timeout for this could be confusing if I have a 
>> restart interval option.
>> 
>> 
>> In my opinion, SIGUSR1 is different from the automatic parameters because it 
>> was (most likely) triggered by user intervention, so that one should ideally 
>> have its own parameter.  If that is the case and this parameter becomes 
>> dedicated to SIGUSR1, then the least ambiguous name I can think of is 
>> sigusr1-timeout.
>>  
> 
> Except that it isn't guaranteed to be called SIGUSR1. Technically it could be 
> a different signal dependent on platform that Apache runs as. But then, as 
> far as I know all UNIX systems do use SIGUSR1.
> 
> 
> In any case, they are "signals": you like signal-timeout? (Also could be 
> taken ambiguously, but maybe less so than restart-timeout?)
>  
>> I also have another type of process restart I am trying to work out how to 
>> accommodate and the naming of options again complicates the problem. In this 
>> case we want to introduce an artificial restart delay.
>> 
>> This would be an option to combat the problem which is being caused by 
>> Django 1.7 in that WSGI script file loading for Django isn't stateless. If a 
>> transient problem occurs, such as the database not being ready, the loading 
>> of the WSGI script file can fail. On the next request an attempt is made to 
>> load it again but now Django kicks a stink because it was half way setting 
>> things up last time when it failed and the setup code cannot be run a second 
>> time. The result is that the process then keeps failing.
>> 
>> The idea of the restart delay option therefore is to allow you to set it to 
>> number of seconds, normally just 1. If set like that, if a WSGI script file 
>> import fails, it will effectively block for the delay specified and when 
>> over it will kill the process so the whole process is thrown away and the 
>> WSGI script file can be reloaded in a fresh process. This gets rid of the 
>> problem of Django initialisation not being able to be retried.
>> 
>> 
>> (We are using turbogears... I don't think I've seen something like that 
>> happen, but rarely have seen start up anomalies.)
>>  
>> A delay is needed to avoid an effective fork bomb, where a WSGI script file 
>> not loading with high request throughput would cause a constant cycle of 
>> processes dying and being replaced. It is possible it wouldn't be as bad as 
>> I think as Apache only checks for dead processes to replace once a second, 
>> but still prefer my own failsafe in case that changes.
>> 
>> I am therefore totally fine with a separate graceful time period for when 
>> SIGUSR1 is used, I just need to juggle these different features and come up 
>> with an option naming scheme that make sense.
>> 
>> How about then that I add the following new options:
>> 
>>     maximum-lifetime - Similar to maximum-requests in that it will cause the 
>> processes to be shutdown and restarted, but in this case it will occur based 
>> on the time period given as argument, measured from the first request or 
>> when the WSGI script file or any other Python code was preloaded, that is, 
>> in the latter case when the process was started.
>> 
>>     restart-timeout - Specifies a separate grace period for when the process 
>> is being forcibly restarted using the graceful restart signal. If 
>> restart-timeout is not specified and graceful-timeout is specified, then the 
>> value of graceful-timeout is used. If neither are specified, then the 
>> restart signal will be have similar to the process being sent a SIGINT.
>> 
>>     linger-timeout - When a WSGI script file, of other Python code is being 
>> imported by mod_wsgi directly, if that fails the default is that the error 
>> is ignored. For a WSGI script file reloading will be attempted on the next 
>> request. But if preloading code then it will fail and merely be logged. If 
>> linger-timeout is specified to a non zero value, with the value being 
>> seconds, then the daemon process will instead be shutdown and restarted to 
>> try and allow a successful reloading of the code to occur if it was a 
>> transient issue. To avoid a fork bomb if a persistent issue, a delay will be 
>> introduced based on the value of the linger-timeout option.
>>  
>> How does that all sound, if it makes sense that is. :-)
>> 
>> 
>> 
>> That sounds absolutely great!  How would I get on the notification cc: of 
>> the ticket or whatever so I'd be informed of progress on that?
> 
> These days my turn around time is pretty quick so long as I am happy and know 
> what to change and how. So I just need to think a bit more about it and gets 
> some day job stuff out of the way before I can do something.
> 
> So don't be surprised if you simply get a reply to this email within a week 
> pointing at a development version to try.
> 
> 
> Well tons of thanks again.
>  
> Graham
> 
>> Graham
>> 
>>  
>> On 17/01/2015, at 12:27 AM, Kent <[email protected]> wrote:
>> 
>>> Thanks again.  Yes, I did take our current version from the repo because 
>>> you hadn't released the SIGUSR1 bit yet...  I should upgrade now.
>>> 
>>> As for the very long graceful-timeout, I was skirting around that solution 
>>> because I like where the setting is currently for SIGUSR1.  I would like to 
>>> ask, "Is there a way to indicate a different graceful-timeout for handling 
>>> SIGUSR1 vs. maximum-requests?" but I already have the answer from the 
>>> release notes: "No."
>>> 
>>> I don't know if you can see the value in distinguishing the two, but 
>>> maximum-requests is sort of a "standard operating mode," so it might make 
>>> sense for a modwsgi user to want a higher, very gentle mode of operation 
>>> there, whereas SIGUSR1, while beautifully more graceful than SIGKILL, still 
>>> "means business," so the same user may want a shorter responsive timeout 
>>> there (while still allowing a decent chunk of time for being graceful to 
>>> running requests).   That is the case for me at least.  Any chance you'd 
>>> entertain that as a feature request?
>>> 
>>> Even if not, you've been extremely helpful, thank you!  And thanks for 
>>> pointing me to the correct version of documentation: I thought I was 
>>> reading current version.
>>> Kent
>>> 
>>> P.S. I also have ideas for possible vertical URL partitioning, but 
>>> unfortunately, our app has much cross-over by URL, so that's why I'm down 
>>> this maximum-requests path...
>>> 
>>> 
>>> On Friday, January 16, 2015 at 4:54:50 AM UTC-5, Graham Dumpleton wrote:
>>> 
>>> On 16/01/2015, at 7:28 AM, Kent <[email protected]> wrote:
>>> 
>>>> I'm running 4 (a very early version of it, possibly before you officially 
>>>> released it).   We upgraded to take advantage of the amazingly-helpful 
>>>> SIGUSR1 signaling for graceful process restarting, which we use somewhat 
>>>> regularly to gracefully deploy software changes (minor ones which won't 
>>>> matter if 2 processes have different versions loaded) without disrupting 
>>>> users.  Thanks a ton for that!
>>> 
>>> SIGUSR1 support was released in version 4.1.0.
>>> 
>>>     
>>> http://modwsgi.readthedocs.org/en/master/release-notes/version-4.1.0.html
>>> 
>>> That same version has all the other stuff which was changed so long as 
>>> using the actual 4.1.0 is being used and you aren't still using the repo 
>>> from before the official release.
>>> 
>>> If not sure, best just upgrading to latest version if you can.
>>> 
>>>> We are also multi-threading our processes (plural processes, plural 
>>>> threads).
>>>> 
>>>> Some requests could be (validly) running for very long periods of time 
>>>> (database reporting, maybe even half hour, though that would be very 
>>>> extreme).
>>>> 
>>>> Some processes (especially those generating .pdfs, for example), hog tons 
>>>> of RAM, as you know, so I'd like these to eventually check their RAM back 
>>>> in, so to speak, by utilizing either inactivity-timeout or 
>>>> maximum-requests, but always in a very gentle way, since, as I mentioned, 
>>>> some requests might be properly running, even though for many minutes.  
>>>> maximum-requests seems too brutal for my use-case since the threshold 
>>>> request sends the process down the graceful-timeout/shutdown-timeout, even 
>>>> if there are valid processes running and then SIGKILLs.  My ideal vision 
>>>> of "maximum-requests," since it is primarily for memory management, is to 
>>>> be very gentle, sort of a "ok, now that I've hit my threshold, at my next 
>>>> earliest convenience, I should die, but only once all my current requests 
>>>> have ended of their own accord."
>>> 
>>> That is where if you vertically partition those URLs out to a separate 
>>> daemon process group, you can simply set a very hight graceful-timeout 
>>> value.
>>> 
>>> So relying on the feature:
>>> 
>>> """
>>> 2. Add a graceful-timeout option to WSGIDaemonProcess. This option is 
>>> applied in a number of circumstances.
>>> 
>>> When maximum-requests and this option are used together, when maximum 
>>> requests is reached, rather than immediately shutdown, potentially 
>>> interupting active requests if they don’t finished with shutdown timeout, 
>>> can specify a separate graceful shutdown period. If the all requests are 
>>> completed within this time frame then will shutdown immediately, otherwise 
>>> normal forced shutdown kicks in. In some respects this is just allowing a 
>>> separate shutdown timeout on cases where requests could be interrupted and 
>>> could avoid it if possible.
>>> """
>>> 
>>> You could set:
>>> 
>>>     maximum-requests=20 graceful-timeout=600
>>> 
>>> So as soon as it hits 20 requests, it switches to a mode where it will when 
>>> no requests, restart. You can set that timeout as high as you want, even 
>>> hours, and it will not care.
>>> 
>>>> "inactivity-timeout" seems to function exactly as I want in that it seems 
>>>> like it won't ever kill a process with a thread with an active request (at 
>>>> least, I can't get it too even by adding a long import 
>>>> time;time.sleep(longtime)... it doesn't seem to die until the request is 
>>>> finished.  But that's why the documentation made me nervous because it 
>>>> implies that it could, in fact, kill a proc with an active request: "For 
>>>> the purposes of this option, being idle means no new requests being 
>>>> received, or no attempts by current requests to read request content or 
>>>> generate response content for the defined period." 
>>> 
>>> The release notes for 4.1.0 say:
>>> 
>>> """
>>> 4. The inactivity-timeout option to WSGIDaemonProcess now only results in 
>>> the daemon process being restarted after the idle timeout period where 
>>> there are no active requests. Previously it would also interrupt a long 
>>> running request. See the new request-timeout option for a way of 
>>> interrupting long running, potentially blocked requests and restarting the 
>>> process.
>>> """
>>> 
>>>> I'd rather have a more gentle "maximum-requests" than "inactivity-timeout" 
>>>> because then, even on very heavy days (when RAM is most likely to choke), 
>>>> I could gracefully turn over these processes a couple times a day, which I 
>>>> couldn't do with "inactivity-timeout" on an extremely heavy day.
>>>> 
>>>> Hope this makes sense.  I'm really asking :
>>>> whether inactivity-timeout triggering will ever SIGKILL a process with an 
>>>> active request, as the docs intimate
>>> No from 4.1.0 onwards.
>>>> whether there is any way to get maximum-requests to behave more gently 
>>>> under all circumstances
>>> By setting a very very long graceful-timeout.
>>>> for your ideas/best advice
>>> Have a good read through the release notes for 4.1.0.
>>> 
>>> Not necessarily useful in your case, but also look at request-timeout. It 
>>> can act as a final fail safe for when things are stuck, but since it is 
>>> more of a fail safe, it doesn't make use of graceful-timeout.
>>> 
>>> Graham
>>> 
>>> 
>>>> Thanks for your help!
>>>> 
>>>> 
>>>> 
>>>> On Wednesday, January 14, 2015 at 9:48:02 PM UTC-5, Graham Dumpleton wrote:
>>>> 
>>>> On 15/01/2015, at 8:32 AM, Kent <[email protected]> wrote: 
>>>> 
>>>> > Graham, the docs state: "For the purposes of this option, being idle 
>>>> > means no new requests being received, or no attempts by current requests 
>>>> > to read request content or generate response content for the defined 
>>>> > period."   
>>>> > 
>>>> > This implies to me that a running request that is taking a long time 
>>>> > could actually be killed as if it were idle (suppose it were fetching a 
>>>> > very slow database query).  Is this the case? 
>>>> 
>>>> This is the case for mod_wsgi prior to version 4.0. 
>>>> 
>>>> Things have changed in mod_wsgi 4.X. 
>>>> 
>>>> How long are your long running requests though? The inactivity-timeout was 
>>>> more about restarting infrequently used applications so that memory can be 
>>>> taken back. 
>>>>  
>>>> 
>>>> > Also, I'm looking for an ultra-conservative and graceful method of 
>>>> > recycling memory. I've read your article on url partitioning, which was 
>>>> > useful, but sooner or later, one must rely on either inactivity-timeout 
>>>> > or maximum-requests, is that accurate?  But both these will eventually, 
>>>> > after graceful timeout/shutdown timeout, potentially kill active 
>>>> > requests.  It is valid for our app to handle long-running reports, so I 
>>>> > was hoping for an ultra-safe mechanism. 
>>>> > Do you have any advice here? 
>>>> 
>>>> The options available in mod_wsgi 4.X are much better in this area than 
>>>> 3.X. The changes in 4.X aren't covered in main documentation though and 
>>>> are only described in the release notes where change was made. 
>>>> 
>>>> In 4.X the concept of an inactivity-timeout is now separate to the idea of 
>>>> a request-timeout. There is also a graceful-timeout that can be applied to 
>>>> maximum-requests and some other situations as well to allow requests to 
>>>> finish out properly before being more brutal. One can also signal the 
>>>> daemon processes to do a more graceful restart as well. 
>>>> 
>>>> You cannot totally avoid having to be brutal though and kill things else 
>>>> you don't have a fail safe for a stuck process where all request threads 
>>>> were blocked on back end services and were never going to recover. Use of 
>>>> multithreading in a process also complicates the implementation of 
>>>> request-timeout. 
>>>> 
>>>> Anyway, the big question is what version are you using? 
>>>> 
>>>> Graham 
>>>> 
>>>> 
>>>> -- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "modwsgi" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/modwsgi.
>>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "modwsgi" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/modwsgi.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/modwsgi.
>> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/modwsgi.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Re: [modwsgi] inactivity-timeout

Reply via email to