Re: [modwsgi] inactivity-timeout

Kent Mon, 26 Jan 2015 05:04:29 -0800

Excellent.  I will certainly try this out, thanks!


On Monday, January 26, 2015 at 12:44:13 AM UTC-5, Graham Dumpleton wrote:
>
> Want to give:
>
>     https://github.com/GrahamDumpleton/mod_wsgi/archive/develop.tar.gz
>
> a go?
>
> The WSGIDaemonProcess directive is 'eviction-timeout'. For 
> mod_wsgi-express the command line option is '--eviction-timeout'.
>
> So the terminology am using around this is that sending a signal is like 
> forcibly evicting the WSGI application, allow the process to be restarted. 
> At least this way can have an option name that is distinct enough from 
> generic 'restart' so as not to be confusing.
>
> Graham
>
> On 21/01/2015, at 11:15 PM, Kent <[email protected] <javascript:>> wrote:
>
>
> On Tuesday, January 20, 2015 at 5:53:26 PM UTC-5, Graham Dumpleton wrote:
>>
>>
>> On 20/01/2015, at 11:50 PM, Kent <[email protected]> wrote:
>>
>> On Sunday, January 18, 2015 at 12:43:08 AM UTC-5, Graham Dumpleton wrote:
>>>
>>> There are a few possibilities here of how this could be enhanced/changed.
>>>
>>> The problem with maximum-requests is that it can be dangerous. People 
>>> can set it too low and when their site gets a big spike of traffic then the 
>>> processes can be restarted too quickly only adding to the load of the site 
>>> and causing things to slow down and hamper their ability to handle the 
>>> spike. This is where setting a longer amount of time for graceful-timeout 
>>> helps because you can set it to be quite large. The use of maximum-requests 
>>> can still be like using a hammer though, and one which can be applied 
>>> unpredictably.
>>>
>>
>> Yes, I can see that. (It may be overkill, but you could default a 
>> separate minimum-lifetime parameter so only users who specifically mess 
>> with that as well as maximum-requests shoot themselves in the foot, but it 
>> is starting to get confusing with all the different timeouts, I'll agree 
>> there...)
>>  
>>
>>
>> The minimum-lifetime option is an interesting idea. It may have to do 
>> nothing by default to avoid conflicts with existing expected behaviour.
>>
>>
>>> The maximum-requests option also doesn't help in the case where you are 
>>> running background threads which do stuff and it is them and not the number 
>>> of requests coming in that dictate things like memory growth that you want 
>>> to counter.
>>>
>>>
>> True, but solving with maximum lifetime... well, actually, solving memory 
>> problems with *any *of these mechanisms isn't measuring the heart of the 
>> problem, which is RAM.  I imagine there isn't a good way to measure RAM or 
>> you would have added that option by now.  Seems what we are truly after for 
>> the majority of these isn't how many requests or how log its been up, etc, 
>> but how much RAM it is taking (or perhaps, optionally, average RAM per 
>> thread, instead).  If my process exceeds consuming 1.5GB, then trigger a 
>> graceful restart at the next appropriate convenience, being gentle to 
>> existing requests.  That may be arguably the most useful parameter.
>>
>>
>> The problem with calculating memory is that there isn't one cross 
>> platform portable way of doing it. On Linux you have to dive into the /proc 
>> file system. On MacOS X you can use C API calls. On Solaris I think you 
>> again need to dive into a /proc file system but it obviously has a 
>> different file structure for getting details out compared to Linux. Adding 
>> such cross platform stuff in gets a bit messy.
>>
>> What I was moving towards as an extension of the monitoring stuff I am 
>> doing for mod_wsgi was to have a special daemon process you can setup which 
>> has access to some sort of management API. You could then create your own 
>> Python script that runs in that and which using the management API can get 
>> daemon process pids and then use Python psutil to get memory usage on 
>> periodic basis and then you decide if process should be restarted and send 
>> it a signal to stop, or management API provided which allows you to notify 
>> in some way, maybe by signal, or maybe using shared memory flag, that 
>> daemon process should shut down.
>>
>>
> I figured there was something making that a pain...
>  
>
>> So the other option I have contemplated adding a number of times is is 
>>> one to periodically restart the process. The way this would work is that a 
>>> process restart would be done periodically based on what time was 
>>> specified. You could therefore say the restart interval was 3600 and it 
>>> would restart the process once an hour.
>>>
>>> The start of the time period for this would either be, when the process 
>>> was created, if any Python code or a WSGI script was preloaded at process 
>>> start time. Or, it would be from when the first request arrived if the WSGi 
>>> application was lazily loaded. This restart-interval could be tied to the 
>>> graceful-timeout option so that you can set and extended period if you want 
>>> to try and ensure that requests are not interrupted.
>>>
>>
>> We just wouldn't want it to die having never even served a single 
>> request, so my vote would be *against *the birth of the process as the 
>> beginning point (and, rather, at first request).
>>
>>
>> It would effectively be from first request if lazily loaded. If preloaded 
>> though, as background threads could be created which do stuff and consume 
>> memory over time, would then be from when process started, ie., when Python 
>> code was preloaded.
>>
>>
> But then for preloaded, processes life-cycle themselves for no reason 
> throughout inactive periods like maybe overnight.  That's not the end of 
> the world, but I wonder if we're catering to the wrong design. (These are, 
> after all, webserver processes, so it seems a fair assumption that they 
> exist primarily to handle requests, else why even run under apache?)  My 
> vote, for what it's worth, would still be timed from first request, but I 
> probably won't use that particular option.  Either way would be useful for 
> some I'm sure.
>  
>
>>
>>> Now we have the ability to sent the process graceful restart signal 
>>> (usually SIGUSR1), to force an individual process to restart.
>>>
>>> Right now this is tied to the graceful-timeout duration as well, which 
>>> as you point out, would perhaps be better off having its own time duration 
>>> for the notional grace period.
>>>
>>> Using the name restart-timeout for this could be confusing if I have a 
>>> restart interval option.
>>>
>>>
>> In my opinion, SIGUSR1 is different from the automatic parameters because 
>> it was (most likely) triggered by user intervention, so that one should 
>> ideally have its own parameter.  If that is the case and this parameter 
>> becomes dedicated to SIGUSR1, then the least ambiguous name I can think of 
>> is *sigusr1-timeout*.
>>  
>>
>>
>> Except that it isn't guaranteed to be called SIGUSR1. Technically it 
>> could be a different signal dependent on platform that Apache runs as. But 
>> then, as far as I know all UNIX systems do use SIGUSR1.
>>
>>
> In any case, they are "signals": you like *signal-timeout?* (Also could 
> be taken ambiguously, but maybe less so than restart-timeout?)
>  
>
>> I also have another type of process restart I am trying to work out how 
>>> to accommodate and the naming of options again complicates the problem. In 
>>> this case we want to introduce an artificial restart delay.
>>>
>>> This would be an option to combat the problem which is being caused by 
>>> Django 1.7 in that WSGI script file loading for Django isn't stateless. If 
>>> a transient problem occurs, such as the database not being ready, the 
>>> loading of the WSGI script file can fail. On the next request an attempt is 
>>> made to load it again but now Django kicks a stink because it was half way 
>>> setting things up last time when it failed and the setup code cannot be run 
>>> a second time. The result is that the process then keeps failing.
>>>
>>> The idea of the restart delay option therefore is to allow you to set it 
>>> to number of seconds, normally just 1. If set like that, if a WSGI script 
>>> file import fails, it will effectively block for the delay specified and 
>>> when over it will kill the process so the whole process is thrown away and 
>>> the WSGI script file can be reloaded in a fresh process. This gets rid of 
>>> the problem of Django initialisation not being able to be retried.
>>>
>>>
>> (We are using turbogears... I don't think I've seen something like that 
>> happen, but rarely have seen start up anomalies.)
>>  
>>
>>> A delay is needed to avoid an effective fork bomb, where a WSGI script 
>>> file not loading with high request throughput would cause a constant cycle 
>>> of processes dying and being replaced. It is possible it wouldn't be as bad 
>>> as I think as Apache only checks for dead processes to replace once a 
>>> second, but still prefer my own failsafe in case that changes.
>>>
>>> I am therefore totally fine with a separate graceful time period for 
>>> when SIGUSR1 is used, I just need to juggle these different features and 
>>> come up with an option naming scheme that make sense.
>>>
>>> How about then that I add the following new options:
>>>
>>>     maximum-lifetime - Similar to maximum-requests in that it will cause 
>>> the processes to be shutdown and restarted, but in this case it will occur 
>>> based on the time period given as argument, measured from the first request 
>>> or when the WSGI script file or any other Python code was preloaded, that 
>>> is, in the latter case when the process was started.
>>>
>>>     restart-timeout - Specifies a separate grace period for when the 
>>> process is being forcibly restarted using the graceful restart signal. If 
>>> restart-timeout is not specified and graceful-timeout is specified, then 
>>> the value of graceful-timeout is used. If neither are specified, then the 
>>> restart signal will be have similar to the process being sent a SIGINT.
>>>
>>>     linger-timeout - When a WSGI script file, of other Python code is 
>>> being imported by mod_wsgi directly, if that fails the default is that the 
>>> error is ignored. For a WSGI script file reloading will be attempted on the 
>>> next request. But if preloading code then it will fail and merely be 
>>> logged. If linger-timeout is specified to a non zero value, with the value 
>>> being seconds, then the daemon process will instead be shutdown and 
>>> restarted to try and allow a successful reloading of the code to occur if 
>>> it was a transient issue. To avoid a fork bomb if a persistent issue, a 
>>> delay will be introduced based on the value of the linger-timeout option.
>>>  
>>>
>> How does that all sound, if it makes sense that is. :-)
>>>
>>>
>>
>> That sounds absolutely great!  How would I get on the notification cc: of 
>> the ticket or whatever so I'd be informed of progress on that?
>>
>>
>> These days my turn around time is pretty quick so long as I am happy and 
>> know what to change and how. So I just need to think a bit more about it 
>> and gets some day job stuff out of the way before I can do something.
>>
>> So don't be surprised if you simply get a reply to this email within a 
>> week pointing at a development version to try.
>>
>>
> Well tons of thanks again.
>  
>
>> Graham
>>
>> Graham
>>>
>>>  
>>
>>> On 17/01/2015, at 12:27 AM, Kent <[email protected]> wrote:
>>>
>>> Thanks again.  Yes, I did take our current version from the repo because 
>>> you hadn't released the SIGUSR1 bit yet...  I should upgrade now.
>>>
>>> As for the very long graceful-timeout, I was skirting around that 
>>> solution because I like where the setting is currently for SIGUSR1.  I 
>>> would like to ask, "Is there a way to indicate a different graceful-timeout 
>>> for handling SIGUSR1 vs. maximum-requests?" but I already have the 
>>> answer from the release notes: "No."
>>>
>>> I don't know if you can see the value in distinguishing the two, but 
>>> maximum-requests 
>>> is sort of a "standard operating mode," so it might make sense for a 
>>> modwsgi user to want a higher, very gentle mode of operation there, whereas 
>>> SIGUSR1, while beautifully more graceful than SIGKILL, still "means 
>>> business," so the same user may want a shorter responsive timeout there 
>>> (while still allowing a decent chunk of time for being graceful to running 
>>> requests).   That is the case for me at least.  Any chance you'd entertain 
>>> that as a feature request?
>>>
>>> Even if not, you've been extremely helpful, thank you!  And thanks for 
>>> pointing me to the correct version of documentation: I thought I was 
>>> reading current version.
>>> Kent
>>>
>>> P.S. I also have ideas for possible vertical URL partitioning, but 
>>> unfortunately, our app has much cross-over by URL, so that's why I'm down 
>>> this maximum-requests path...
>>>
>>>
>>> On Friday, January 16, 2015 at 4:54:50 AM UTC-5, Graham Dumpleton wrote:
>>>>
>>>>
>>>> On 16/01/2015, at 7:28 AM, Kent <[email protected]> wrote:
>>>>
>>>> I'm running 4 (a very early version of it, possibly before you 
>>>> officially released it).   We upgraded to take advantage of the 
>>>> amazingly-helpful SIGUSR1 signaling for graceful process restarting, 
>>>> which we use somewhat regularly to gracefully deploy software changes 
>>>> (minor ones which won't matter if 2 processes have different versions 
>>>> loaded) without disrupting users.  Thanks a ton for that!
>>>>
>>>>
>>>> SIGUSR1 support was released in version 4.1.0.
>>>>
>>>>     
>>>> http://modwsgi.readthedocs.org/en/master/release-notes/version-4.1.0.html
>>>>
>>>> That same version has all the other stuff which was changed so long as 
>>>> using the actual 4.1.0 is being used and you aren't still using the repo 
>>>> from before the official release.
>>>>
>>>> If not sure, best just upgrading to latest version if you can.
>>>>
>>>> We are also multi-threading our processes (plural processes, plural 
>>>> threads).
>>>>
>>>> Some requests could be (validly) running for very long periods of time 
>>>> (database reporting, maybe even half hour, though that would be very 
>>>> extreme).
>>>>
>>>> Some processes (especially those generating .pdfs, for example), hog 
>>>> tons of RAM, as you know, so I'd like these to eventually check their RAM 
>>>> back in, so to speak, by utilizing either inactivity-timeout or 
>>>> maximum-requests, but always in a very gentle way, since, as I 
>>>> mentioned, some requests might be properly running, even though for many 
>>>> minutes.  maximum-requests seems too brutal for my use-case since the 
>>>> threshold request sends the process down 
>>>> the graceful-timeout/shutdown-timeout, even if there are valid processes 
>>>> running and then SIGKILLs.  My ideal vision of "maximum-requests," 
>>>> since it is *primarily for memory management,* is to be very gentle, 
>>>> sort of a "ok, now that I've hit my threshold, at my next earliest 
>>>> convenience, I should die, but only once all my current requests have 
>>>> ended 
>>>> of their own accord."
>>>>
>>>>
>>>> That is where if you vertically partition those URLs out to a separate 
>>>> daemon process group, you can simply set a very hight graceful-timeout 
>>>> value.
>>>>
>>>> So relying on the feature:
>>>>
>>>> """
>>>> 2. Add a graceful-timeout option to WSGIDaemonProcess. This option is 
>>>> applied in a number of circumstances.
>>>>
>>>> When maximum-requests and this option are used together, when maximum 
>>>> requests is reached, rather than immediately shutdown, potentially 
>>>> interupting active requests if they don’t finished with shutdown timeout, 
>>>> can specify a separate graceful shutdown period. If the all requests are 
>>>> completed within this time frame then will shutdown immediately, otherwise 
>>>> normal forced shutdown kicks in. In some respects this is just allowing a 
>>>> separate shutdown timeout on cases where requests could be interrupted and 
>>>> could avoid it if possible.
>>>> """
>>>>
>>>> You could set:
>>>>
>>>>     maximum-requests=20 graceful-timeout=600
>>>>
>>>> So as soon as it hits 20 requests, it switches to a mode where it will 
>>>> when no requests, restart. You can set that timeout as high as you want, 
>>>> even hours, and it will not care.
>>>>
>>>> "inactivity-timeout" seems to function exactly as I want in that it 
>>>> seems like it won't ever kill a process with a thread with an active 
>>>> request (at least, I can't get it too even by adding a long import 
>>>> time;time.sleep(longtime)... it doesn't seem to die until the request 
>>>> is finished.  But that's why the documentation made me nervous because it 
>>>> implies that it *could, *in fact, kill a proc with an active request: 
>>>> *"For 
>>>> the purposes of this option, being idle means no new requests being 
>>>> received, or no attempts by current requests to read request content or 
>>>> generate response content for the defined period."  * 
>>>>
>>>>
>>>> The release notes for 4.1.0 say:
>>>>
>>>> """
>>>> 4. The inactivity-timeout option to WSGIDaemonProcess now only results 
>>>> in the daemon process being restarted after the idle timeout period where 
>>>> there are no active requests. Previously it would also interrupt a long 
>>>> running request. See the new request-timeout option for a way of 
>>>> interrupting long running, potentially blocked requests and restarting the 
>>>> process.
>>>> """
>>>>
>>>> I'd rather have a more gentle "maximum-requests" than 
>>>> "inactivity-timeout" because then, even on very heavy days (when RAM is 
>>>> most likely to choke), I could gracefully turn over these processes a 
>>>> couple times a day, which I couldn't do with "inactivity-timeout" on an 
>>>> extremely heavy day.
>>>>
>>>> Hope this makes sense.  I'm really asking :
>>>>
>>>>    1. whether inactivity-timeout triggering will ever SIGKILL a 
>>>>    process with an active request, as the docs intimate
>>>>
>>>> No from 4.1.0 onwards.
>>>>
>>>>
>>>>    1. whether there is any way to get maximum-requests to behave more 
>>>>    gently under all circumstances
>>>>
>>>> By setting a very very long graceful-timeout.
>>>>
>>>>
>>>>    1. for your ideas/best advice
>>>>
>>>> Have a good read through the release notes for 4.1.0.
>>>>
>>>> Not necessarily useful in your case, but also look at request-timeout. 
>>>> It can act as a final fail safe for when things are stuck, but since it is 
>>>> more of a fail safe, it doesn't make use of graceful-timeout.
>>>>
>>>> Graham
>>>>
>>>>
>>>> Thanks for your help!
>>>>
>>>>
>>>>
>>>> On Wednesday, January 14, 2015 at 9:48:02 PM UTC-5, Graham Dumpleton 
>>>> wrote:
>>>>>
>>>>>
>>>>> On 15/01/2015, at 8:32 AM, Kent <[email protected]> wrote: 
>>>>>
>>>>> > Graham, the docs state: "For the purposes of this option, being idle 
>>>>> means no new requests being received, or no attempts by current requests 
>>>>> to 
>>>>> read request content or generate response content for the defined 
>>>>> period." 
>>>>>   
>>>>> > 
>>>>> > This implies to me that a running request that is taking a long time 
>>>>> could actually be killed as if it were idle (suppose it were fetching a 
>>>>> very slow database query).  Is this the case? 
>>>>>
>>>>> This is the case for mod_wsgi prior to version 4.0. 
>>>>>
>>>>> Things have changed in mod_wsgi 4.X. 
>>>>>
>>>>> How long are your long running requests though? The inactivity-timeout 
>>>>> was more about restarting infrequently used applications so that memory 
>>>>> can 
>>>>> be taken back. 
>>>>>
>>>>  
>>>>
>>>>>
>>>>> > Also, I'm looking for an ultra-conservative and graceful method of 
>>>>> recycling memory. I've read your article on url partitioning, which was 
>>>>> useful, but sooner or later, one must rely on either inactivity-timeout 
>>>>> or 
>>>>> maximum-requests, is that accurate?  But both these will eventually, 
>>>>> after 
>>>>> graceful timeout/shutdown timeout, potentially kill active requests.  It 
>>>>> is 
>>>>> valid for our app to handle long-running reports, so I was hoping for an 
>>>>> ultra-safe mechanism. 
>>>>> > Do you have any advice here? 
>>>>>
>>>>> The options available in mod_wsgi 4.X are much better in this area 
>>>>> than 3.X. The changes in 4.X aren't covered in main documentation though 
>>>>> and are only described in the release notes where change was made. 
>>>>>
>>>>> In 4.X the concept of an inactivity-timeout is now separate to the 
>>>>> idea of a request-timeout. There is also a graceful-timeout that can be 
>>>>> applied to maximum-requests and some other situations as well to allow 
>>>>> requests to finish out properly before being more brutal. One can also 
>>>>> signal the daemon processes to do a more graceful restart as well. 
>>>>>
>>>>> You cannot totally avoid having to be brutal though and kill things 
>>>>> else you don't have a fail safe for a stuck process where all request 
>>>>> threads were blocked on back end services and were never going to 
>>>>> recover. 
>>>>> Use of multithreading in a process also complicates the implementation of 
>>>>> request-timeout. 
>>>>>
>>>>> Anyway, the big question is what version are you using? 
>>>>>
>>>>> Graham 
>>>>>
>>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "modwsgi" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/modwsgi.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "modwsgi" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/modwsgi.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/modwsgi.
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected] <javascript:>
> .
> Visit this group at http://groups.google.com/group/modwsgi.
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Re: [modwsgi] inactivity-timeout

Reply via email to