Excellent. I will certainly try this out, thanks!
On Monday, January 26, 2015 at 12:44:13 AM UTC-5, Graham Dumpleton wrote: > > Want to give: > > https://github.com/GrahamDumpleton/mod_wsgi/archive/develop.tar.gz > > a go? > > The WSGIDaemonProcess directive is 'eviction-timeout'. For > mod_wsgi-express the command line option is '--eviction-timeout'. > > So the terminology am using around this is that sending a signal is like > forcibly evicting the WSGI application, allow the process to be restarted. > At least this way can have an option name that is distinct enough from > generic 'restart' so as not to be confusing. > > Graham > > On 21/01/2015, at 11:15 PM, Kent <[email protected] <javascript:>> wrote: > > > On Tuesday, January 20, 2015 at 5:53:26 PM UTC-5, Graham Dumpleton wrote: >> >> >> On 20/01/2015, at 11:50 PM, Kent <[email protected]> wrote: >> >> On Sunday, January 18, 2015 at 12:43:08 AM UTC-5, Graham Dumpleton wrote: >>> >>> There are a few possibilities here of how this could be enhanced/changed. >>> >>> The problem with maximum-requests is that it can be dangerous. People >>> can set it too low and when their site gets a big spike of traffic then the >>> processes can be restarted too quickly only adding to the load of the site >>> and causing things to slow down and hamper their ability to handle the >>> spike. This is where setting a longer amount of time for graceful-timeout >>> helps because you can set it to be quite large. The use of maximum-requests >>> can still be like using a hammer though, and one which can be applied >>> unpredictably. >>> >> >> Yes, I can see that. (It may be overkill, but you could default a >> separate minimum-lifetime parameter so only users who specifically mess >> with that as well as maximum-requests shoot themselves in the foot, but it >> is starting to get confusing with all the different timeouts, I'll agree >> there...) >> >> >> >> The minimum-lifetime option is an interesting idea. It may have to do >> nothing by default to avoid conflicts with existing expected behaviour. >> >> >>> The maximum-requests option also doesn't help in the case where you are >>> running background threads which do stuff and it is them and not the number >>> of requests coming in that dictate things like memory growth that you want >>> to counter. >>> >>> >> True, but solving with maximum lifetime... well, actually, solving memory >> problems with *any *of these mechanisms isn't measuring the heart of the >> problem, which is RAM. I imagine there isn't a good way to measure RAM or >> you would have added that option by now. Seems what we are truly after for >> the majority of these isn't how many requests or how log its been up, etc, >> but how much RAM it is taking (or perhaps, optionally, average RAM per >> thread, instead). If my process exceeds consuming 1.5GB, then trigger a >> graceful restart at the next appropriate convenience, being gentle to >> existing requests. That may be arguably the most useful parameter. >> >> >> The problem with calculating memory is that there isn't one cross >> platform portable way of doing it. On Linux you have to dive into the /proc >> file system. On MacOS X you can use C API calls. On Solaris I think you >> again need to dive into a /proc file system but it obviously has a >> different file structure for getting details out compared to Linux. Adding >> such cross platform stuff in gets a bit messy. >> >> What I was moving towards as an extension of the monitoring stuff I am >> doing for mod_wsgi was to have a special daemon process you can setup which >> has access to some sort of management API. You could then create your own >> Python script that runs in that and which using the management API can get >> daemon process pids and then use Python psutil to get memory usage on >> periodic basis and then you decide if process should be restarted and send >> it a signal to stop, or management API provided which allows you to notify >> in some way, maybe by signal, or maybe using shared memory flag, that >> daemon process should shut down. >> >> > I figured there was something making that a pain... > > >> So the other option I have contemplated adding a number of times is is >>> one to periodically restart the process. The way this would work is that a >>> process restart would be done periodically based on what time was >>> specified. You could therefore say the restart interval was 3600 and it >>> would restart the process once an hour. >>> >>> The start of the time period for this would either be, when the process >>> was created, if any Python code or a WSGI script was preloaded at process >>> start time. Or, it would be from when the first request arrived if the WSGi >>> application was lazily loaded. This restart-interval could be tied to the >>> graceful-timeout option so that you can set and extended period if you want >>> to try and ensure that requests are not interrupted. >>> >> >> We just wouldn't want it to die having never even served a single >> request, so my vote would be *against *the birth of the process as the >> beginning point (and, rather, at first request). >> >> >> It would effectively be from first request if lazily loaded. If preloaded >> though, as background threads could be created which do stuff and consume >> memory over time, would then be from when process started, ie., when Python >> code was preloaded. >> >> > But then for preloaded, processes life-cycle themselves for no reason > throughout inactive periods like maybe overnight. That's not the end of > the world, but I wonder if we're catering to the wrong design. (These are, > after all, webserver processes, so it seems a fair assumption that they > exist primarily to handle requests, else why even run under apache?) My > vote, for what it's worth, would still be timed from first request, but I > probably won't use that particular option. Either way would be useful for > some I'm sure. > > >> >>> Now we have the ability to sent the process graceful restart signal >>> (usually SIGUSR1), to force an individual process to restart. >>> >>> Right now this is tied to the graceful-timeout duration as well, which >>> as you point out, would perhaps be better off having its own time duration >>> for the notional grace period. >>> >>> Using the name restart-timeout for this could be confusing if I have a >>> restart interval option. >>> >>> >> In my opinion, SIGUSR1 is different from the automatic parameters because >> it was (most likely) triggered by user intervention, so that one should >> ideally have its own parameter. If that is the case and this parameter >> becomes dedicated to SIGUSR1, then the least ambiguous name I can think of >> is *sigusr1-timeout*. >> >> >> >> Except that it isn't guaranteed to be called SIGUSR1. Technically it >> could be a different signal dependent on platform that Apache runs as. But >> then, as far as I know all UNIX systems do use SIGUSR1. >> >> > In any case, they are "signals": you like *signal-timeout?* (Also could > be taken ambiguously, but maybe less so than restart-timeout?) > > >> I also have another type of process restart I am trying to work out how >>> to accommodate and the naming of options again complicates the problem. In >>> this case we want to introduce an artificial restart delay. >>> >>> This would be an option to combat the problem which is being caused by >>> Django 1.7 in that WSGI script file loading for Django isn't stateless. If >>> a transient problem occurs, such as the database not being ready, the >>> loading of the WSGI script file can fail. On the next request an attempt is >>> made to load it again but now Django kicks a stink because it was half way >>> setting things up last time when it failed and the setup code cannot be run >>> a second time. The result is that the process then keeps failing. >>> >>> The idea of the restart delay option therefore is to allow you to set it >>> to number of seconds, normally just 1. If set like that, if a WSGI script >>> file import fails, it will effectively block for the delay specified and >>> when over it will kill the process so the whole process is thrown away and >>> the WSGI script file can be reloaded in a fresh process. This gets rid of >>> the problem of Django initialisation not being able to be retried. >>> >>> >> (We are using turbogears... I don't think I've seen something like that >> happen, but rarely have seen start up anomalies.) >> >> >>> A delay is needed to avoid an effective fork bomb, where a WSGI script >>> file not loading with high request throughput would cause a constant cycle >>> of processes dying and being replaced. It is possible it wouldn't be as bad >>> as I think as Apache only checks for dead processes to replace once a >>> second, but still prefer my own failsafe in case that changes. >>> >>> I am therefore totally fine with a separate graceful time period for >>> when SIGUSR1 is used, I just need to juggle these different features and >>> come up with an option naming scheme that make sense. >>> >>> How about then that I add the following new options: >>> >>> maximum-lifetime - Similar to maximum-requests in that it will cause >>> the processes to be shutdown and restarted, but in this case it will occur >>> based on the time period given as argument, measured from the first request >>> or when the WSGI script file or any other Python code was preloaded, that >>> is, in the latter case when the process was started. >>> >>> restart-timeout - Specifies a separate grace period for when the >>> process is being forcibly restarted using the graceful restart signal. If >>> restart-timeout is not specified and graceful-timeout is specified, then >>> the value of graceful-timeout is used. If neither are specified, then the >>> restart signal will be have similar to the process being sent a SIGINT. >>> >>> linger-timeout - When a WSGI script file, of other Python code is >>> being imported by mod_wsgi directly, if that fails the default is that the >>> error is ignored. For a WSGI script file reloading will be attempted on the >>> next request. But if preloading code then it will fail and merely be >>> logged. If linger-timeout is specified to a non zero value, with the value >>> being seconds, then the daemon process will instead be shutdown and >>> restarted to try and allow a successful reloading of the code to occur if >>> it was a transient issue. To avoid a fork bomb if a persistent issue, a >>> delay will be introduced based on the value of the linger-timeout option. >>> >>> >> How does that all sound, if it makes sense that is. :-) >>> >>> >> >> That sounds absolutely great! How would I get on the notification cc: of >> the ticket or whatever so I'd be informed of progress on that? >> >> >> These days my turn around time is pretty quick so long as I am happy and >> know what to change and how. So I just need to think a bit more about it >> and gets some day job stuff out of the way before I can do something. >> >> So don't be surprised if you simply get a reply to this email within a >> week pointing at a development version to try. >> >> > Well tons of thanks again. > > >> Graham >> >> Graham >>> >>> >> >>> On 17/01/2015, at 12:27 AM, Kent <[email protected]> wrote: >>> >>> Thanks again. Yes, I did take our current version from the repo because >>> you hadn't released the SIGUSR1 bit yet... I should upgrade now. >>> >>> As for the very long graceful-timeout, I was skirting around that >>> solution because I like where the setting is currently for SIGUSR1. I >>> would like to ask, "Is there a way to indicate a different graceful-timeout >>> for handling SIGUSR1 vs. maximum-requests?" but I already have the >>> answer from the release notes: "No." >>> >>> I don't know if you can see the value in distinguishing the two, but >>> maximum-requests >>> is sort of a "standard operating mode," so it might make sense for a >>> modwsgi user to want a higher, very gentle mode of operation there, whereas >>> SIGUSR1, while beautifully more graceful than SIGKILL, still "means >>> business," so the same user may want a shorter responsive timeout there >>> (while still allowing a decent chunk of time for being graceful to running >>> requests). That is the case for me at least. Any chance you'd entertain >>> that as a feature request? >>> >>> Even if not, you've been extremely helpful, thank you! And thanks for >>> pointing me to the correct version of documentation: I thought I was >>> reading current version. >>> Kent >>> >>> P.S. I also have ideas for possible vertical URL partitioning, but >>> unfortunately, our app has much cross-over by URL, so that's why I'm down >>> this maximum-requests path... >>> >>> >>> On Friday, January 16, 2015 at 4:54:50 AM UTC-5, Graham Dumpleton wrote: >>>> >>>> >>>> On 16/01/2015, at 7:28 AM, Kent <[email protected]> wrote: >>>> >>>> I'm running 4 (a very early version of it, possibly before you >>>> officially released it). We upgraded to take advantage of the >>>> amazingly-helpful SIGUSR1 signaling for graceful process restarting, >>>> which we use somewhat regularly to gracefully deploy software changes >>>> (minor ones which won't matter if 2 processes have different versions >>>> loaded) without disrupting users. Thanks a ton for that! >>>> >>>> >>>> SIGUSR1 support was released in version 4.1.0. >>>> >>>> >>>> http://modwsgi.readthedocs.org/en/master/release-notes/version-4.1.0.html >>>> >>>> That same version has all the other stuff which was changed so long as >>>> using the actual 4.1.0 is being used and you aren't still using the repo >>>> from before the official release. >>>> >>>> If not sure, best just upgrading to latest version if you can. >>>> >>>> We are also multi-threading our processes (plural processes, plural >>>> threads). >>>> >>>> Some requests could be (validly) running for very long periods of time >>>> (database reporting, maybe even half hour, though that would be very >>>> extreme). >>>> >>>> Some processes (especially those generating .pdfs, for example), hog >>>> tons of RAM, as you know, so I'd like these to eventually check their RAM >>>> back in, so to speak, by utilizing either inactivity-timeout or >>>> maximum-requests, but always in a very gentle way, since, as I >>>> mentioned, some requests might be properly running, even though for many >>>> minutes. maximum-requests seems too brutal for my use-case since the >>>> threshold request sends the process down >>>> the graceful-timeout/shutdown-timeout, even if there are valid processes >>>> running and then SIGKILLs. My ideal vision of "maximum-requests," >>>> since it is *primarily for memory management,* is to be very gentle, >>>> sort of a "ok, now that I've hit my threshold, at my next earliest >>>> convenience, I should die, but only once all my current requests have >>>> ended >>>> of their own accord." >>>> >>>> >>>> That is where if you vertically partition those URLs out to a separate >>>> daemon process group, you can simply set a very hight graceful-timeout >>>> value. >>>> >>>> So relying on the feature: >>>> >>>> """ >>>> 2. Add a graceful-timeout option to WSGIDaemonProcess. This option is >>>> applied in a number of circumstances. >>>> >>>> When maximum-requests and this option are used together, when maximum >>>> requests is reached, rather than immediately shutdown, potentially >>>> interupting active requests if they don’t finished with shutdown timeout, >>>> can specify a separate graceful shutdown period. If the all requests are >>>> completed within this time frame then will shutdown immediately, otherwise >>>> normal forced shutdown kicks in. In some respects this is just allowing a >>>> separate shutdown timeout on cases where requests could be interrupted and >>>> could avoid it if possible. >>>> """ >>>> >>>> You could set: >>>> >>>> maximum-requests=20 graceful-timeout=600 >>>> >>>> So as soon as it hits 20 requests, it switches to a mode where it will >>>> when no requests, restart. You can set that timeout as high as you want, >>>> even hours, and it will not care. >>>> >>>> "inactivity-timeout" seems to function exactly as I want in that it >>>> seems like it won't ever kill a process with a thread with an active >>>> request (at least, I can't get it too even by adding a long import >>>> time;time.sleep(longtime)... it doesn't seem to die until the request >>>> is finished. But that's why the documentation made me nervous because it >>>> implies that it *could, *in fact, kill a proc with an active request: >>>> *"For >>>> the purposes of this option, being idle means no new requests being >>>> received, or no attempts by current requests to read request content or >>>> generate response content for the defined period." * >>>> >>>> >>>> The release notes for 4.1.0 say: >>>> >>>> """ >>>> 4. The inactivity-timeout option to WSGIDaemonProcess now only results >>>> in the daemon process being restarted after the idle timeout period where >>>> there are no active requests. Previously it would also interrupt a long >>>> running request. See the new request-timeout option for a way of >>>> interrupting long running, potentially blocked requests and restarting the >>>> process. >>>> """ >>>> >>>> I'd rather have a more gentle "maximum-requests" than >>>> "inactivity-timeout" because then, even on very heavy days (when RAM is >>>> most likely to choke), I could gracefully turn over these processes a >>>> couple times a day, which I couldn't do with "inactivity-timeout" on an >>>> extremely heavy day. >>>> >>>> Hope this makes sense. I'm really asking : >>>> >>>> 1. whether inactivity-timeout triggering will ever SIGKILL a >>>> process with an active request, as the docs intimate >>>> >>>> No from 4.1.0 onwards. >>>> >>>> >>>> 1. whether there is any way to get maximum-requests to behave more >>>> gently under all circumstances >>>> >>>> By setting a very very long graceful-timeout. >>>> >>>> >>>> 1. for your ideas/best advice >>>> >>>> Have a good read through the release notes for 4.1.0. >>>> >>>> Not necessarily useful in your case, but also look at request-timeout. >>>> It can act as a final fail safe for when things are stuck, but since it is >>>> more of a fail safe, it doesn't make use of graceful-timeout. >>>> >>>> Graham >>>> >>>> >>>> Thanks for your help! >>>> >>>> >>>> >>>> On Wednesday, January 14, 2015 at 9:48:02 PM UTC-5, Graham Dumpleton >>>> wrote: >>>>> >>>>> >>>>> On 15/01/2015, at 8:32 AM, Kent <[email protected]> wrote: >>>>> >>>>> > Graham, the docs state: "For the purposes of this option, being idle >>>>> means no new requests being received, or no attempts by current requests >>>>> to >>>>> read request content or generate response content for the defined >>>>> period." >>>>> >>>>> > >>>>> > This implies to me that a running request that is taking a long time >>>>> could actually be killed as if it were idle (suppose it were fetching a >>>>> very slow database query). Is this the case? >>>>> >>>>> This is the case for mod_wsgi prior to version 4.0. >>>>> >>>>> Things have changed in mod_wsgi 4.X. >>>>> >>>>> How long are your long running requests though? The inactivity-timeout >>>>> was more about restarting infrequently used applications so that memory >>>>> can >>>>> be taken back. >>>>> >>>> >>>> >>>>> >>>>> > Also, I'm looking for an ultra-conservative and graceful method of >>>>> recycling memory. I've read your article on url partitioning, which was >>>>> useful, but sooner or later, one must rely on either inactivity-timeout >>>>> or >>>>> maximum-requests, is that accurate? But both these will eventually, >>>>> after >>>>> graceful timeout/shutdown timeout, potentially kill active requests. It >>>>> is >>>>> valid for our app to handle long-running reports, so I was hoping for an >>>>> ultra-safe mechanism. >>>>> > Do you have any advice here? >>>>> >>>>> The options available in mod_wsgi 4.X are much better in this area >>>>> than 3.X. The changes in 4.X aren't covered in main documentation though >>>>> and are only described in the release notes where change was made. >>>>> >>>>> In 4.X the concept of an inactivity-timeout is now separate to the >>>>> idea of a request-timeout. There is also a graceful-timeout that can be >>>>> applied to maximum-requests and some other situations as well to allow >>>>> requests to finish out properly before being more brutal. One can also >>>>> signal the daemon processes to do a more graceful restart as well. >>>>> >>>>> You cannot totally avoid having to be brutal though and kill things >>>>> else you don't have a fail safe for a stuck process where all request >>>>> threads were blocked on back end services and were never going to >>>>> recover. >>>>> Use of multithreading in a process also complicates the implementation of >>>>> request-timeout. >>>>> >>>>> Anyway, the big question is what version are you using? >>>>> >>>>> Graham >>>>> >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "modwsgi" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at http://groups.google.com/group/modwsgi. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "modwsgi" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/modwsgi. >>> For more options, visit https://groups.google.com/d/optout. >>> >>> >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "modwsgi" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/modwsgi. >> For more options, visit https://groups.google.com/d/optout. >> >> >> > -- > You received this message because you are subscribed to the Google Groups > "modwsgi" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > To post to this group, send email to [email protected] <javascript:> > . > Visit this group at http://groups.google.com/group/modwsgi. > For more options, visit https://groups.google.com/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/modwsgi. For more options, visit https://groups.google.com/d/optout.
