Let me be more specific. I'm having a hard time getting this to test as I
expected. Here is my WSGIDaemonProcess directive:
WSGIDaemonProcess rarch processes=3 threads=2 inactivity-timeout=1800
display-name=%{GROUP} *graceful-timeout=140 eviction-timeout=60*
python-eggs=/home/rarch/tg2env/lib/python-egg-cache
I put a 120 sec sleep in one of the processes' requests and then SIGUSR1
(Linux) all three processes. The two inactive ones immediately restart, as
I expect. However, the 3rd (sleeping) one is allowed to run past the 60
second eviction_timeout and runs straight to the graceful_timeout before it
is terminated. Shouldn't it have been killed at 60 sec?
(And then, as my previous question, how does shutdown-timeout factor into
all this?)
Thanks again!
Kent
On Tuesday, January 27, 2015 at 9:34:12 AM UTC-5, Kent wrote:
>
> I think I might understand the difference between 'graceful-timeout' and
> 'shutdown-timeout', but can you please just clarify the difference? Are
> they additive?
>
> Also, will 'eviction-timeout' interact with either of those, or simply
> override them?
>
> Thanks,
> Kent
>
> On Monday, January 26, 2015 at 12:44:13 AM UTC-5, Graham Dumpleton wrote:
>>
>> Want to give:
>>
>> https://github.com/GrahamDumpleton/mod_wsgi/archive/develop.tar.gz
>>
>> a go?
>>
>> The WSGIDaemonProcess directive is 'eviction-timeout'. For
>> mod_wsgi-express the command line option is '--eviction-timeout'.
>>
>> So the terminology am using around this is that sending a signal is like
>> forcibly evicting the WSGI application, allow the process to be restarted.
>> At least this way can have an option name that is distinct enough from
>> generic 'restart' so as not to be confusing.
>>
>> Graham
>>
>> On 21/01/2015, at 11:15 PM, Kent <[email protected]> wrote:
>>
>>
>> On Tuesday, January 20, 2015 at 5:53:26 PM UTC-5, Graham Dumpleton wrote:
>>>
>>>
>>> On 20/01/2015, at 11:50 PM, Kent <[email protected]> wrote:
>>>
>>> On Sunday, January 18, 2015 at 12:43:08 AM UTC-5, Graham Dumpleton wrote:
>>>>
>>>> There are a few possibilities here of how this could be
>>>> enhanced/changed.
>>>>
>>>> The problem with maximum-requests is that it can be dangerous. People
>>>> can set it too low and when their site gets a big spike of traffic then
>>>> the
>>>> processes can be restarted too quickly only adding to the load of the site
>>>> and causing things to slow down and hamper their ability to handle the
>>>> spike. This is where setting a longer amount of time for graceful-timeout
>>>> helps because you can set it to be quite large. The use of
>>>> maximum-requests
>>>> can still be like using a hammer though, and one which can be applied
>>>> unpredictably.
>>>>
>>>
>>> Yes, I can see that. (It may be overkill, but you could default a
>>> separate minimum-lifetime parameter so only users who specifically mess
>>> with that as well as maximum-requests shoot themselves in the foot, but it
>>> is starting to get confusing with all the different timeouts, I'll agree
>>> there...)
>>>
>>>
>>>
>>> The minimum-lifetime option is an interesting idea. It may have to do
>>> nothing by default to avoid conflicts with existing expected behaviour.
>>>
>>>
>>>> The maximum-requests option also doesn't help in the case where you are
>>>> running background threads which do stuff and it is them and not the
>>>> number
>>>> of requests coming in that dictate things like memory growth that you want
>>>> to counter.
>>>>
>>>>
>>> True, but solving with maximum lifetime... well, actually, solving
>>> memory problems with *any *of these mechanisms isn't measuring the
>>> heart of the problem, which is RAM. I imagine there isn't a good way to
>>> measure RAM or you would have added that option by now. Seems what we are
>>> truly after for the majority of these isn't how many requests or how log
>>> its been up, etc, but how much RAM it is taking (or perhaps, optionally,
>>> average RAM per thread, instead). If my process exceeds consuming 1.5GB,
>>> then trigger a graceful restart at the next appropriate convenience, being
>>> gentle to existing requests. That may be arguably the most useful
>>> parameter.
>>>
>>>
>>> The problem with calculating memory is that there isn't one cross
>>> platform portable way of doing it. On Linux you have to dive into the /proc
>>> file system. On MacOS X you can use C API calls. On Solaris I think you
>>> again need to dive into a /proc file system but it obviously has a
>>> different file structure for getting details out compared to Linux. Adding
>>> such cross platform stuff in gets a bit messy.
>>>
>>> What I was moving towards as an extension of the monitoring stuff I am
>>> doing for mod_wsgi was to have a special daemon process you can setup which
>>> has access to some sort of management API. You could then create your own
>>> Python script that runs in that and which using the management API can get
>>> daemon process pids and then use Python psutil to get memory usage on
>>> periodic basis and then you decide if process should be restarted and send
>>> it a signal to stop, or management API provided which allows you to notify
>>> in some way, maybe by signal, or maybe using shared memory flag, that
>>> daemon process should shut down.
>>>
>>>
>> I figured there was something making that a pain...
>>
>>
>>> So the other option I have contemplated adding a number of times is is
>>>> one to periodically restart the process. The way this would work is that a
>>>> process restart would be done periodically based on what time was
>>>> specified. You could therefore say the restart interval was 3600 and it
>>>> would restart the process once an hour.
>>>>
>>>> The start of the time period for this would either be, when the process
>>>> was created, if any Python code or a WSGI script was preloaded at process
>>>> start time. Or, it would be from when the first request arrived if the
>>>> WSGi
>>>> application was lazily loaded. This restart-interval could be tied to the
>>>> graceful-timeout option so that you can set and extended period if you
>>>> want
>>>> to try and ensure that requests are not interrupted.
>>>>
>>>
>>> We just wouldn't want it to die having never even served a single
>>> request, so my vote would be *against *the birth of the process as the
>>> beginning point (and, rather, at first request).
>>>
>>>
>>> It would effectively be from first request if lazily loaded. If
>>> preloaded though, as background threads could be created which do stuff and
>>> consume memory over time, would then be from when process started, ie.,
>>> when Python code was preloaded.
>>>
>>>
>> But then for preloaded, processes life-cycle themselves for no reason
>> throughout inactive periods like maybe overnight. That's not the end of
>> the world, but I wonder if we're catering to the wrong design. (These are,
>> after all, webserver processes, so it seems a fair assumption that they
>> exist primarily to handle requests, else why even run under apache?) My
>> vote, for what it's worth, would still be timed from first request, but I
>> probably won't use that particular option. Either way would be useful for
>> some I'm sure.
>>
>>
>>>
>>>> Now we have the ability to sent the process graceful restart signal
>>>> (usually SIGUSR1), to force an individual process to restart.
>>>>
>>>> Right now this is tied to the graceful-timeout duration as well, which
>>>> as you point out, would perhaps be better off having its own time duration
>>>> for the notional grace period.
>>>>
>>>> Using the name restart-timeout for this could be confusing if I have a
>>>> restart interval option.
>>>>
>>>>
>>> In my opinion, SIGUSR1 is different from the automatic parameters
>>> because it was (most likely) triggered by user intervention, so that one
>>> should ideally have its own parameter. If that is the case and this
>>> parameter becomes dedicated to SIGUSR1, then the least ambiguous name I can
>>> think of is *sigusr1-timeout*.
>>>
>>>
>>>
>>> Except that it isn't guaranteed to be called SIGUSR1. Technically it
>>> could be a different signal dependent on platform that Apache runs as. But
>>> then, as far as I know all UNIX systems do use SIGUSR1.
>>>
>>>
>> In any case, they are "signals": you like *signal-timeout?* (Also could
>> be taken ambiguously, but maybe less so than restart-timeout?)
>>
>>
>>> I also have another type of process restart I am trying to work out how
>>>> to accommodate and the naming of options again complicates the problem. In
>>>> this case we want to introduce an artificial restart delay.
>>>>
>>>> This would be an option to combat the problem which is being caused by
>>>> Django 1.7 in that WSGI script file loading for Django isn't stateless. If
>>>> a transient problem occurs, such as the database not being ready, the
>>>> loading of the WSGI script file can fail. On the next request an attempt
>>>> is
>>>> made to load it again but now Django kicks a stink because it was half way
>>>> setting things up last time when it failed and the setup code cannot be
>>>> run
>>>> a second time. The result is that the process then keeps failing.
>>>>
>>>> The idea of the restart delay option therefore is to allow you to set
>>>> it to number of seconds, normally just 1. If set like that, if a WSGI
>>>> script file import fails, it will effectively block for the delay
>>>> specified
>>>> and when over it will kill the process so the whole process is thrown away
>>>> and the WSGI script file can be reloaded in a fresh process. This gets rid
>>>> of the problem of Django initialisation not being able to be retried.
>>>>
>>>>
>>> (We are using turbogears... I don't think I've seen something like that
>>> happen, but rarely have seen start up anomalies.)
>>>
>>>
>>>> A delay is needed to avoid an effective fork bomb, where a WSGI script
>>>> file not loading with high request throughput would cause a constant cycle
>>>> of processes dying and being replaced. It is possible it wouldn't be as
>>>> bad
>>>> as I think as Apache only checks for dead processes to replace once a
>>>> second, but still prefer my own failsafe in case that changes.
>>>>
>>>> I am therefore totally fine with a separate graceful time period for
>>>> when SIGUSR1 is used, I just need to juggle these different features and
>>>> come up with an option naming scheme that make sense.
>>>>
>>>> How about then that I add the following new options:
>>>>
>>>> maximum-lifetime - Similar to maximum-requests in that it will
>>>> cause the processes to be shutdown and restarted, but in this case it will
>>>> occur based on the time period given as argument, measured from the first
>>>> request or when the WSGI script file or any other Python code was
>>>> preloaded, that is, in the latter case when the process was started.
>>>>
>>>> restart-timeout - Specifies a separate grace period for when the
>>>> process is being forcibly restarted using the graceful restart signal. If
>>>> restart-timeout is not specified and graceful-timeout is specified, then
>>>> the value of graceful-timeout is used. If neither are specified, then the
>>>> restart signal will be have similar to the process being sent a SIGINT.
>>>>
>>>> linger-timeout - When a WSGI script file, of other Python code is
>>>> being imported by mod_wsgi directly, if that fails the default is that the
>>>> error is ignored. For a WSGI script file reloading will be attempted on
>>>> the
>>>> next request. But if preloading code then it will fail and merely be
>>>> logged. If linger-timeout is specified to a non zero value, with the value
>>>> being seconds, then the daemon process will instead be shutdown and
>>>> restarted to try and allow a successful reloading of the code to occur if
>>>> it was a transient issue. To avoid a fork bomb if a persistent issue, a
>>>> delay will be introduced based on the value of the linger-timeout option.
>>>>
>>>>
>>> How does that all sound, if it makes sense that is. :-)
>>>>
>>>>
>>>
>>> That sounds absolutely great! How would I get on the notification cc:
>>> of the ticket or whatever so I'd be informed of progress on that?
>>>
>>>
>>> These days my turn around time is pretty quick so long as I am happy and
>>> know what to change and how. So I just need to think a bit more about it
>>> and gets some day job stuff out of the way before I can do something.
>>>
>>> So don't be surprised if you simply get a reply to this email within a
>>> week pointing at a development version to try.
>>>
>>>
>> Well tons of thanks again.
>>
>>
>>> Graham
>>>
>>> Graham
>>>>
>>>>
>>>
>>>> On 17/01/2015, at 12:27 AM, Kent <[email protected]> wrote:
>>>>
>>>> Thanks again. Yes, I did take our current version from the repo
>>>> because you hadn't released the SIGUSR1 bit yet... I should upgrade now.
>>>>
>>>> As for the very long graceful-timeout, I was skirting around that
>>>> solution because I like where the setting is currently for SIGUSR1. I
>>>> would like to ask, "Is there a way to indicate a different
>>>> graceful-timeout
>>>> for handling SIGUSR1 vs. maximum-requests?" but I already have the
>>>> answer from the release notes: "No."
>>>>
>>>> I don't know if you can see the value in distinguishing the two, but
>>>> maximum-requests
>>>> is sort of a "standard operating mode," so it might make sense for a
>>>> modwsgi user to want a higher, very gentle mode of operation there,
>>>> whereas
>>>> SIGUSR1, while beautifully more graceful than SIGKILL, still "means
>>>> business," so the same user may want a shorter responsive timeout there
>>>> (while still allowing a decent chunk of time for being graceful to running
>>>> requests). That is the case for me at least. Any chance you'd entertain
>>>> that as a feature request?
>>>>
>>>> Even if not, you've been extremely helpful, thank you! And thanks for
>>>> pointing me to the correct version of documentation: I thought I was
>>>> reading current version.
>>>> Kent
>>>>
>>>> P.S. I also have ideas for possible vertical URL partitioning, but
>>>> unfortunately, our app has much cross-over by URL, so that's why I'm down
>>>> this maximum-requests path...
>>>>
>>>>
>>>> On Friday, January 16, 2015 at 4:54:50 AM UTC-5, Graham Dumpleton wrote:
>>>>>
>>>>>
>>>>> On 16/01/2015, at 7:28 AM, Kent <[email protected]> wrote:
>>>>>
>>>>> I'm running 4 (a very early version of it, possibly before you
>>>>> officially released it). We upgraded to take advantage of the
>>>>> amazingly-helpful SIGUSR1 signaling for graceful process restarting,
>>>>> which we use somewhat regularly to gracefully deploy software changes
>>>>> (minor ones which won't matter if 2 processes have different versions
>>>>> loaded) without disrupting users. Thanks a ton for that!
>>>>>
>>>>>
>>>>> SIGUSR1 support was released in version 4.1.0.
>>>>>
>>>>>
>>>>> http://modwsgi.readthedocs.org/en/master/release-notes/version-4.1.0.html
>>>>>
>>>>> That same version has all the other stuff which was changed so long as
>>>>> using the actual 4.1.0 is being used and you aren't still using the repo
>>>>> from before the official release.
>>>>>
>>>>> If not sure, best just upgrading to latest version if you can.
>>>>>
>>>>> We are also multi-threading our processes (plural processes, plural
>>>>> threads).
>>>>>
>>>>> Some requests could be (validly) running for very long periods of time
>>>>> (database reporting, maybe even half hour, though that would be very
>>>>> extreme).
>>>>>
>>>>> Some processes (especially those generating .pdfs, for example), hog
>>>>> tons of RAM, as you know, so I'd like these to eventually check their RAM
>>>>> back in, so to speak, by utilizing either inactivity-timeout or
>>>>> maximum-requests, but always in a very gentle way, since, as I
>>>>> mentioned, some requests might be properly running, even though for many
>>>>> minutes. maximum-requests seems too brutal for my use-case since the
>>>>> threshold request sends the process down
>>>>> the graceful-timeout/shutdown-timeout, even if there are valid processes
>>>>> running and then SIGKILLs. My ideal vision of "maximum-requests,"
>>>>> since it is *primarily for memory management,* is to be very gentle,
>>>>> sort of a "ok, now that I've hit my threshold, at my next earliest
>>>>> convenience, I should die, but only once all my current requests have
>>>>> ended
>>>>> of their own accord."
>>>>>
>>>>>
>>>>> That is where if you vertically partition those URLs out to a separate
>>>>> daemon process group, you can simply set a very hight graceful-timeout
>>>>> value.
>>>>>
>>>>> So relying on the feature:
>>>>>
>>>>> """
>>>>> 2. Add a graceful-timeout option to WSGIDaemonProcess. This option is
>>>>> applied in a number of circumstances.
>>>>>
>>>>> When maximum-requests and this option are used together, when maximum
>>>>> requests is reached, rather than immediately shutdown, potentially
>>>>> interupting active requests if they don’t finished with shutdown timeout,
>>>>> can specify a separate graceful shutdown period. If the all requests are
>>>>> completed within this time frame then will shutdown immediately,
>>>>> otherwise
>>>>> normal forced shutdown kicks in. In some respects this is just allowing a
>>>>> separate shutdown timeout on cases where requests could be interrupted
>>>>> and
>>>>> could avoid it if possible.
>>>>> """
>>>>>
>>>>> You could set:
>>>>>
>>>>> maximum-requests=20 graceful-timeout=600
>>>>>
>>>>> So as soon as it hits 20 requests, it switches to a mode where it will
>>>>> when no requests, restart. You can set that timeout as high as you want,
>>>>> even hours, and it will not care.
>>>>>
>>>>> "inactivity-timeout" seems to function exactly as I want in that it
>>>>> seems like it won't ever kill a process with a thread with an active
>>>>> request (at least, I can't get it too even by adding a long import
>>>>> time;time.sleep(longtime)... it doesn't seem to die until the request
>>>>> is finished. But that's why the documentation made me nervous because it
>>>>> implies that it *could, *in fact, kill a proc with an active request:
>>>>> *"For
>>>>> the purposes of this option, being idle means no new requests being
>>>>> received, or no attempts by current requests to read request content or
>>>>> generate response content for the defined period." *
>>>>>
>>>>>
>>>>> The release notes for 4.1.0 say:
>>>>>
>>>>> """
>>>>> 4. The inactivity-timeout option to WSGIDaemonProcess now only results
>>>>> in the daemon process being restarted after the idle timeout period where
>>>>> there are no active requests. Previously it would also interrupt a long
>>>>> running request. See the new request-timeout option for a way of
>>>>> interrupting long running, potentially blocked requests and restarting
>>>>> the
>>>>> process.
>>>>> """
>>>>>
>>>>> I'd rather have a more gentle "maximum-requests" than
>>>>> "inactivity-timeout" because then, even on very heavy days (when RAM is
>>>>> most likely to choke), I could gracefully turn over these processes a
>>>>> couple times a day, which I couldn't do with "inactivity-timeout" on an
>>>>> extremely heavy day.
>>>>>
>>>>> Hope this makes sense. I'm really asking :
>>>>>
>>>>> 1. whether inactivity-timeout triggering will ever SIGKILL a
>>>>> process with an active request, as the docs intimate
>>>>>
>>>>> No from 4.1.0 onwards.
>>>>>
>>>>>
>>>>> 1. whether there is any way to get maximum-requests to behave more
>>>>> gently under all circumstances
>>>>>
>>>>> By setting a very very long graceful-timeout.
>>>>>
>>>>>
>>>>> 1. for your ideas/best advice
>>>>>
>>>>> Have a good read through the release notes for 4.1.0.
>>>>>
>>>>> Not necessarily useful in your case, but also look at request-timeout.
>>>>> It can act as a final fail safe for when things are stuck, but since it
>>>>> is
>>>>> more of a fail safe, it doesn't make use of graceful-timeout.
>>>>>
>>>>> Graham
>>>>>
>>>>>
>>>>> Thanks for your help!
>>>>>
>>>>>
>>>>>
>>>>> On Wednesday, January 14, 2015 at 9:48:02 PM UTC-5, Graham Dumpleton
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 15/01/2015, at 8:32 AM, Kent <[email protected]> wrote:
>>>>>>
>>>>>> > Graham, the docs state: "For the purposes of this option, being
>>>>>> idle means no new requests being received, or no attempts by current
>>>>>> requests to read request content or generate response content for the
>>>>>> defined period."
>>>>>> >
>>>>>> > This implies to me that a running request that is taking a long
>>>>>> time could actually be killed as if it were idle (suppose it were
>>>>>> fetching
>>>>>> a very slow database query). Is this the case?
>>>>>>
>>>>>> This is the case for mod_wsgi prior to version 4.0.
>>>>>>
>>>>>> Things have changed in mod_wsgi 4.X.
>>>>>>
>>>>>> How long are your long running requests though? The
>>>>>> inactivity-timeout was more about restarting infrequently used
>>>>>> applications
>>>>>> so that memory can be taken back.
>>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> > Also, I'm looking for an ultra-conservative and graceful method of
>>>>>> recycling memory. I've read your article on url partitioning, which was
>>>>>> useful, but sooner or later, one must rely on either inactivity-timeout
>>>>>> or
>>>>>> maximum-requests, is that accurate? But both these will eventually,
>>>>>> after
>>>>>> graceful timeout/shutdown timeout, potentially kill active requests. It
>>>>>> is
>>>>>> valid for our app to handle long-running reports, so I was hoping for an
>>>>>> ultra-safe mechanism.
>>>>>> > Do you have any advice here?
>>>>>>
>>>>>> The options available in mod_wsgi 4.X are much better in this area
>>>>>> than 3.X. The changes in 4.X aren't covered in main documentation though
>>>>>> and are only described in the release notes where change was made.
>>>>>>
>>>>>> In 4.X the concept of an inactivity-timeout is now separate to the
>>>>>> idea of a request-timeout. There is also a graceful-timeout that can be
>>>>>> applied to maximum-requests and some other situations as well to allow
>>>>>> requests to finish out properly before being more brutal. One can also
>>>>>> signal the daemon processes to do a more graceful restart as well.
>>>>>>
>>>>>> You cannot totally avoid having to be brutal though and kill things
>>>>>> else you don't have a fail safe for a stuck process where all request
>>>>>> threads were blocked on back end services and were never going to
>>>>>> recover.
>>>>>> Use of multithreading in a process also complicates the implementation
>>>>>> of
>>>>>> request-timeout.
>>>>>>
>>>>>> Anyway, the big question is what version are you using?
>>>>>>
>>>>>> Graham
>>>>>>
>>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "modwsgi" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at http://groups.google.com/group/modwsgi.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>>
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "modwsgi" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/modwsgi.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "modwsgi" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/modwsgi.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/modwsgi.
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
--
You received this message because you are subscribed to the Google Groups
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.