Re: [modwsgi] inactivity-timeout

Kent Bower Wed, 04 Feb 2015 13:58:52 -0800

Yes, sir, my tests also seem to show it works as you intend it to.

Thanks.


On Tue, Feb 3, 2015 at 5:20 AM, Graham Dumpleton <[email protected]
> wrote:

> Should now be fixed.
>
> On 03/02/2015, at 8:50 PM, Graham Dumpleton <[email protected]>
> wrote:
>
> The application of the eviction timeout should not be fixed in develop
> branch.
>
>     https://github.com/GrahamDumpleton/mod_wsgi/archive/develop.tar.gz
>
> Graham
>
> On 03/02/2015, at 5:02 PM, Graham Dumpleton <[email protected]>
> wrote:
>
>
>
> On 3 February 2015 at 04:15, Kent Bower <[email protected]> wrote:
>
>> On Sun, Feb 1, 2015 at 7:08 PM, Graham Dumpleton <
>> [email protected]> wrote:
>>
>>> Your Flask client doesn't need to know about Celery, as your web
>>> application accepts requests as normal and it is your Python code which
>>> would queue the job with Celery.
>>>
>>> Now looking back, the only configuration I can find, but which I don't
>>> know if it is your actual production configuration is:
>>>
>>>     WSGIDaemonProcess rarch processes=3 threads=2
>>> inactivity-timeout=1800 display-name=%{GROUP} graceful-timeout=140
>>> eviction-timeout=60 python-eggs=/home/rarch/tg2env/lib/python-egg-cache
>>>
>>> Provided that you don't then start to have overall host memory issues,
>>> the simplest way around this whole issue is not to use a multithreaded
>>> process.
>>>
>>>
>> What you would do is vertically partition your URL name space so that
>>> just the URLs which do the long running report generation would be
>>> delegated to single threaded processes. Everything else would keep going to
>>> the multithread processes.
>>>
>>>     WSGIDaemonProcess rarch processes=3 threads=2
>>>     WSGIDaemonProvess rarch-long-running processes=6 threads=1
>>> maximum-requests=20
>>>
>>>     WSGIProcessGroup rarch
>>>
>>>     <Location /suburl/of/long/running/report/generator>
>>>     WSGIProcessGroup rarch-long-running
>>>     </Location>
>>>
>>> You wouldn't even have to worry about the graceful-timeout
>>> on rarch-long-running as that is only relevant for maxiumum-requests where
>>> it is a multithreaded processes.
>>>
>>> So what would happen is that when the request has finished, if
>>> maximum-requests is reached, the process would be restarted even before any
>>> new request was accepted by the process, so there is no chance of a new
>>> request being interrupted.
>>>
>>> You could still set an eviction-timeout of some suitably large value to
>>> allow you to use SIGUSR1 to be sent to processes in that daemon process
>>> group to shut them down.
>>>
>>> In this case, having eviction-timeout being able to be set independent
>>> of graceful-timeout (for maximum-requests), is probably useful and so I
>>> will retain the option.
>>>
>>> So is there any reason you couldn't use a daemon process group with many
>>> single threaded process instead?
>>>
>>
>>
>> This is very good to know (that single threaded procs would behave more
>> ideally in these circumstances).  The above was just my configuration for
>> testing 'eviction-timeout'.  Our software generally runs with many more
>> processes and threads, on servers with maybe 16 or 32 GB RAM.  And
>> unfortunately, the RAM is the limiting resource here as our python app,
>> built on turbo-gears, is a memory hog and we have yet to find the resources
>> to dissect that.  I was aiming to head in the direction of URL
>> partitioning, but there are big obstacles.  (Chiefly, RAM consumption would
>> make threads=1 and yet more processes very difficult unless we spend the
>> huge effort in dissecting the app to locate and pull the many unused memory
>> hogging libraries out.)
>>
>> So, URL partitioning is sort of the ideal, distant solution, as well as a
>> Celery-like polling solution, but out of my reach for now.
>>
>
> Have you ever run a test where you compare the whole memory usage of your
> application where all URLs are visited, to how much memory is used if only
> the URL which generates the long running report is visited?
>
> In Django at least, a lot of stuff is lazily loaded only when a URL
> requiring it is first accessed. So even with a heavy code base, there can
> still be benefits in splitting out URLs to their own processes because the
> whole code base wouldn't be loaded due to the lazy loading.
>
> So do you have any actual memory figures from doing that?
>
> How many URLs are there that generates these reports vs those that don't,
> or is that all the whole application does?
>
> Are your most frequently visited URLs those generating the reports or
> something else?
>
>
>> Another question for multithreaded graceful-timeout with
>> maximum-requests:  during a period of heavy traffic, it seems
>> the graceful-timeout setting just pushes the real timeout until
>> shutdown-timeout because, if heavy enough, you'll be getting requests
>> during graceful-timeout.  That diminishes the fidelity of
>> "graceful-timeout."  Do you see where I'm coming from (even if you're happy
>> with the design and don't want to mess with it, which I'd understand)?
>>
>>
>> Ok, here is the log demonstrating the troubles I saw
>> with eviction-timeout.  For demonstration purposes, here is the simplified
>> directive I'm using:
>>
>> WSGIDaemonProcess rarch processes=1 threads=1 display-name=%{GROUP} 
>> *graceful-timeout=140
>> eviction-timeout=60 *python-eggs=/home/rarch/tg2env/lib/python-egg-cache
>>
>> Here is the log:
>>
>> [Mon Feb 02 11:36:16 2015] [info] Init: Initializing (virtual) servers
>> for SSL
>> [Mon Feb 02 11:36:16 2015] [info] Server: Apache/2.2.3, Interface:
>> mod_ssl/2.2.3, Library: OpenSSL/0.9.8e-fips-rhel5
>> [Mon Feb 02 11:36:16 2015] [notice] Digest: generating secret for digest
>> authentication ...
>> [Mon Feb 02 11:36:16 2015] [notice] Digest: done
>> [Mon Feb 02 11:36:16 2015] [info] APR LDAP: Built with OpenLDAP LDAP SDK
>> [Mon Feb 02 11:36:16 2015] [info] LDAP: SSL support available
>> [Mon Feb 02 11:36:16 2015] [info] Init: Seeding PRNG with 256 bytes of
>> entropy
>> [Mon Feb 02 11:36:16 2015] [info] Init: Generating temporary RSA private
>> keys (512/1024 bits)
>> [Mon Feb 02 11:36:16 2015] [info] Init: Generating temporary DH
>> parameters (512/1024 bits)
>> [Mon Feb 02 11:36:16 2015] [info] Shared memory session cache initialised
>> [Mon Feb 02 11:36:16 2015] [info] Init: Initializing (virtual) servers
>> for SSL
>> [Mon Feb 02 11:36:16 2015] [info] Server: Apache/2.2.3, Interface:
>> mod_ssl/2.2.3, Library: OpenSSL/0.9.8e-fips-rhel5
>> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447): Starting process
>> 'rarch' with uid=48, gid=48 and threads=1.
>> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447): Python home
>> /home/rarch/tg2env.
>> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447): Initializing
>> Python.
>> [Mon Feb 02 11:36:16 2015] [notice] Apache/2.2.3 (CentOS) configured --
>> resuming normal operations
>> [Mon Feb 02 11:36:16 2015] [info] Server built: Aug 30 2010 12:28:40
>> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447): Attach
>> interpreter ''.
>> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447, process='rarch',
>> application=''): Loading WSGI script
>> '/home/rarch/trunk/src/appserver/wsgi-config/wsgi-deployment.py'.
>> *[Mon Feb 02 11:39:13 2015] [info] mod_wsgi (pid=29447): Process eviction
>> requested, waiting for requests to complete 'rarch'.*
>> *[Mon Feb 02 11:41:00 2015] [info] mod_wsgi (pid=29447): Daemon process
>> graceful timer expired 'rarch'.*
>> [Mon Feb 02 11:41:00 2015] [info] mod_wsgi (pid=29447): Shutdown
>> requested 'rarch'.
>> [Mon Feb 02 11:41:05 2015] [info] mod_wsgi (pid=29447): Aborting process
>> 'rarch'.
>> [Mon Feb 02 11:41:05 2015] [info] mod_wsgi (pid=29447): Exiting process
>> 'rarch'.
>> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=29447): Process 'rarch'
>> has died, deregister and restart it.
>> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=29447): Process 'rarch'
>> has been deregistered and will no longer be monitored.
>> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331): Starting process
>> 'rarch' with uid=48, gid=48 and threads=1.
>> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331): Python home
>> /home/rarch/tg2env.
>> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331): Initializing
>> Python.
>> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331): Attach
>> interpreter ''.
>> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331, process='rarch',
>> application=''): Loading WSGI script
>> '/home/rarch/trunk/src/appserver/wsgi-config/wsgi-deployment.py'.
>>
>> The process was signaled at 11:39:13 with eviction-timeout=60 but
>> 11:40:13 came and passed and nothing happened until 107 seconds passed, at
>> which time graceful timer expired.
>>
>>
>> Next, I changed the parameters a little:
>>
>> WSGIDaemonProcess rarch processes=1 threads=1 display-name=%{GROUP} 
>> *eviction-timeout=30
>> graceful-timeout=240* python-eggs=/home/rarch/tg2env/lib/python-egg-cache
>>
>> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381): Starting process
>> 'rarch' with uid=48, gid=48 and threads=1.
>> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381): Python home
>> /home/rarch/tg2env.
>> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381): Initializing
>> Python.
>> [Mon Feb 02 12:06:57 2015] [notice] Apache/2.2.3 (CentOS) configured --
>> resuming normal operations
>> [Mon Feb 02 12:06:57 2015] [info] Server built: Aug 30 2010 12:28:40
>> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381): Attach interpreter
>> ''.
>> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381, process='rarch',
>> application=''): Loading WSGI script
>> '/home/rarch/trunk/src/appserver/wsgi-config/wsgi-deployment.py'.
>> [Mon Feb 02 12:07:19 2015] [info] mod_wsgi (pid=3381): *Process eviction
>> requested*, waiting for requests to complete 'rarch'.
>> [Mon Feb 02 12:11:01 2015] [info] mod_wsgi (pid=3381): *Daemon process
>> graceful timer expired* 'rarch'.
>> [Mon Feb 02 12:11:01 2015] [info] mod_wsgi (pid=3381): Shutdown requested
>> 'rarch'.
>> [Mon Feb 02 12:11:06 2015] [info] mod_wsgi (pid=3381): Aborting process
>> 'rarch'.
>> [Mon Feb 02 12:11:06 2015] [info] mod_wsgi (pid=3381): Exiting process
>> 'rarch'.
>> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=3381): Process 'rarch'
>> has died, deregister and restart it.
>> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=3381): Process 'rarch'
>> has been deregistered and will no longer be monitored.
>> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028): Starting process
>> 'rarch' with uid=48, gid=48 and threads=1.
>> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028): Python home
>> /home/rarch/tg2env.
>> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028): Initializing
>> Python.
>> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028): Attach interpreter
>> ''.
>> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028, process='rarch',
>> application=''): Loading WSGI script
>> '/home/rarch/trunk/src/appserver/wsgi-config/wsgi-deployment.py'.
>>
>>
>> So, for me, eviction-timeout is apparently being ignored...
>>
>
> The background monitor thread which monitors for expiry wasn't taking into
> consideration the eviction timeout period being able to be less than the
> graceful timeout. I didn't see a problem as I was also setting request
> timeout, which causes the way the monitor thread works to be different,
> waking up every second regardless. I will work on a fix for that.
>
> Another issue for consideration is if a graceful timeout is already in
> progress and a signal comes in for eviction, which timeout wins? Right now
> the eviction time will trump the graceful time if already set by maximum
> requests. The converse isn't true though in that if already in eviction
> cycle and maximum requests arrives, it wouldn't be trumped by graceful
> timeout. So eviction time had authority given that it was triggered by
> explicit user signal. It does mean that the signal could effectively extend
> what ever graceful time was in progress.
>
> Graham
>
>
>> Thanks again for all your time and help,
>> Kent
>>
>>
>>
>>> Note that since only a sub set of URLs would go to the daemon process
>>> group, the memory usage profile will change as you aren't potentially
>>> loading the complete application code into those processes and only those
>>> needed for that URL and that report. So it could use up less memory than
>>> application as a whole, allowing you to have multiple single threaded
>>> processes with no issue.
>>>
>>> Graham
>>>
>>> On 31/01/2015, at 12:31 AM, Kent <[email protected]> wrote:
>>>
>>> Thanks for your reply and recommendations.  We're aware of the issues,
>>> but I didn't give the full picture for brevity's sake.  The reports are
>>> user generated reports.  Ultimately, the users know whether the reports
>>> should return quickly (which many, many will), or whether they are
>>> long-running.  There is no way for the application to know that, so to
>>> avoid some sort of polling (which we've done in the past and was a pain in
>>> the rear to users), the design is to allow the *user *to decide whether
>>> to run the report in the background or "foreground" via a check box.  Since
>>> most reports will return in a matter of a minute or so, we wanted to avoid
>>> the pain of making them poll, but I need to look at Celery.  However, I'm
>>> not comfortable punishing users for accidentally choosing foreground on a
>>> long-running report.  That is, not for an automatic turn-over mechanism
>>> like maximum-requests or inactivity-timeout.  In my mind, those are
>>> inherently different than something like a SIGUSR1 mechanism because the
>>> former are automatic.
>>>
>>> So, while admitting there are edge cases we are using that don't have a
>>> perfect solution (or even admitting we need a better mechanism in that
>>> case), it still seems to me mod_wsgi should be somewhat agnostic of design
>>> choices.  In other words, when it comes to *automatic *turning over of
>>> processes, it seems mod_wsgi shouldn't be involved with length of time
>>> considerations, except to allow the user to specify timeouts.  See, the
>>> long running reports are only one of my concerns: we also fight with
>>> database locks sometimes, held by another application attached to the same
>>> database and wholly out of our control.  Sometimes those locks can be held
>>> for many minutes on a request that normally should complete within
>>> seconds.  There too, it seems mod_wsgi should be very gentle in the
>>> automatic turnover cases.
>>>
>>> Thanks for pointing to Celery.  I really wonder whether I can get
>>> a message broker to work with Adobe Flash, our current client, but I
>>> haven't looked into this much yet.
>>>
>>> Also, my apologies if you believe this to have been a waste of time on
>>> your part.  You've been extremely helpful, though and I'm quite thankful
>>> for your time!  I understand you not wanting to redesign the
>>> shutdown-timeout thing and mess with what otherwise isn't broken.  Would
>>> you still like me to post the apache debug logs regarding
>>> 'eviction-timeout' or have you changed your mind about releasing that?  (In
>>> which case, extra apologies.)
>>>
>>> Kent
>>>
>>>
>>>
>>>
>>> On Friday, January 30, 2015 at 6:34:28 AM UTC-5, Graham Dumpleton wrote:
>>>>
>>>> If you have web requests generating reports which take 40 minutes to
>>>> run, you are going the wrong way about it.
>>>>
>>>> What would be regarded as best practice for long running requests is to
>>>> use a task queuing system to queue up the task to be run and run it in a
>>>> distinct set of processes to the web server. Your web request can then
>>>> return immediately, with some sort of polling system used as necessary to
>>>> check the progress of the task and allow the result to be downloaded when
>>>> complete. By using a separate system to run the tasks, it doesn't matter
>>>> whether the web server is restarted as the tasks will still run and after
>>>> the web server is restarted, a user can still check on progress of the
>>>> tasks and get back his response.
>>>>
>>>> The most common such task execution system for doing this sort of thing
>>>> is Celery.
>>>>
>>>> So it is because you aren't using the correct tool for the job here
>>>> that you are fighting against things like timeouts in the web server. No
>>>> web server is really a suitable environment to be used as an in process
>>>> task execution system. The web server should handle requests quickly and
>>>> offload longer processing tasks a separate task system which is purpose
>>>> built for handling the management of long running tasks.
>>>>
>>>> I am not inclined to keep fiddling how the timeouts work now I
>>>> understand what you are trying to do. I am even questioning now whether I
>>>> should have introduced the separate eviction timeout I already did given
>>>> that it is turning out to be a questionable use case.
>>>>
>>>> I would really recommend you look at re-architecting how you do things.
>>>> I don't think I would have any trouble finding others on the list who would
>>>> advise the same thing and who could also give you further advice on using
>>>> something like Celery instead for task execution.
>>>>
>>>> Graham
>>>>
>>>> On 29/01/2015, at 7:30 AM, Kent <[email protected]> wrote:
>>>>
>>>> Ok, I plan to run those tests with debug and post, but please, in the
>>>> meantime:
>>>>
>>>> For our app, not interrupting existing requests is a higher priority
>>>> than being able to accept new requests, particularly since we typically run
>>>> many wsgi processes, each with a handful of threads.  So, I'm not really
>>>> concerned about maintaining always available threads (statistically, I will
>>>> be fine... that isn't the issue for me).
>>>>
>>>> In these circumstances, it would be much better for all these
>>>> triggering events (SIGUSR1, maximum-requests, or inactivity-timeout, etc.)
>>>> to immediately stop accepting new requests and "concentrate" on shutting
>>>> down.  (Unless that means requests waiting in apache are terminated because
>>>> they were queued for this particular process, but I doubt apache has
>>>> already determined the request's process if *none *are available, has
>>>> it?)  With high graceful-timeout/eviction-timeout and low
>>>> shutdown-timeout, I run a pretty high risk of accepting a new request at
>>>> the tail end of graceful-timeout or eviction-timeout, only to have it
>>>> basically doomed to ungraceful death because many of our requests are long
>>>> running (very often well over 5 or 10 sec).
>>>>
>>>> I guess that's why, through experimentation with SIGUSR1 a few years
>>>> back, I ended up "graceful-timeout=5 shutdown-timeout=300" ... the opposite
>>>> of how it would default, because this works well when trying to signal
>>>> these to recycle themselves: they basically immediately stop accepting new
>>>> requests so your "guaranteed" graceful timeout is 300.  It seems I have no
>>>> way to "guarantee" a very large graceful timeout for each and every
>>>> request, even if affected by maximum-requests or inactivity-timeout, and
>>>> specify a different (lower) one for SIGUSR1 because the only truly
>>>> guaranteed lifetime in seconds is "shutdown-timeout," is that accurate?
>>>>
>>>> The ideal for our app, which may accept certain request that run for
>>>> several minutes is this:
>>>>
>>>>    - if maximum-requests or inactivity-timeout is hit, stop taking new
>>>>    requests immediately and shutdown as soon as possible, but give existing
>>>>    requests basically all the time they need to finish (say, up to 40 
>>>> minutes
>>>>    (for long-running db reports)).
>>>>    - if SIGUSR1, stop taking new requests immediately and shutdown as
>>>>    soon as possible, but give existing requests a really good chance to
>>>>    complete, maybe 3-5 minutes, but not the 40 minutes, because this is
>>>>    slightly more urgent (was triggered manually and a user is
>>>>    monitoring/waiting for turnover and wants new code in place)
>>>>
>>>> I don't think I can accomplish the above if I understand the design
>>>> correctly because a request may have been accepted at the tail end of
>>>> graceful-timeout/eviction-timeout and so is only guaranteed a lifetime
>>>> of shutdown-timeout, regardless of what the trigger was (SIGUSR1 vs.
>>>> automatic).
>>>>
>>>> Is my understanding of this accurate?
>>>>
>>>>
>>>>
>>>> On Tuesday, January 27, 2015 at 9:48:01 PM UTC-5, Graham Dumpleton
>>>> wrote:
>>>>
>>>> Can you ensure that LogLevel is set to at least info and provide what
>>>> messages are in the Apache error log file
>>>>
>>>> If I use:
>>>>
>>>>     $ mod_wsgi-express start-server hack/sleep.wsg--log-level=debug
>>>> --verbose-debugging --eviction-timeout 30 --graceful-timeout 60
>>>>
>>>> which is equivalent to:
>>>>
>>>>     WSGIDaemonProcess … graceful-timeout=60 eviction-timeout=30
>>>>
>>>> and fire a request against application that sleeps a long time I see in
>>>> the Apache error logs at the time of the signal:
>>>>
>>>> [Wed Jan 28 13:34:34 2015] [info] mod_wsgi (pid=29639): Process
>>>> eviction requested, waiting for requests to complete 'localhost:8000'.
>>>>
>>>> At the end of the 30 seconds given by the eviction timeout I see:
>>>>
>>>> [Wed Jan 28 13:35:05 2015] [info] mod_wsgi (pid=29639): Daemon process
>>>> graceful timer expired 'localhost:8000'.
>>>> [Wed Jan 28 13:35:05 2015] [info] mod_wsgi (pid=29639): Shutdown
>>>> requested 'localhost:8000'.
>>>>
>>>> Up till that point the process would still have been accepting new
>>>> requests and was waiting for point that there was no active requests to
>>>> allow it to shutdown.
>>>>
>>>> As the timeout tripped at 30 seconds, it then instead goes into the
>>>> more brutal shutdown process. No new requests are accepted from this point.
>>>>
>>>> For my setup the shutdown-timeout defaults to 5 seconds and because the
>>>> request still hadn't completed within 5 seconds, then the process is exited
>>>> anyway and allowed to shutdown.
>>>>
>>>> [Wed Jan 28 13:35:10 2015] [info] mod_wsgi (pid=29639): Aborting
>>>> process 'localhost:8000'.
>>>> [Wed Jan 28 13:35:10 2015] [info] mod_wsgi (pid=29639): Exiting process
>>>> 'localhost:8000'.
>>>>
>>>> Because the application never returned a response, that results in the
>>>> Apache child worker who was trying to talk to the daemon process seeing a
>>>> truncated response.
>>>>
>>>> [Wed Jan 28 13:35:10 2015] [error] [client 127.0.0.1] Truncated or
>>>> oversized response headers received from daemon process 'localhost:8000':
>>>> /tmp/mod_wsgi-localhost:8000:502/htdocs/
>>>>
>>>> When the Apache parent process notices the daemon process has died, it
>>>> cleans up and starts a new one.
>>>>
>>>> [Wed Jan 28 13:35:11 2015] [info] mod_wsgi (pid=29639): Process
>>>> 'localhost:8000' has died, deregister and restart it.
>>>> [Wed Jan 28 13:35:11 2015] [info] mod_wsgi (pid=29639): Process
>>>> 'localhost:8000' has been deregistered and will no longer be monitored.
>>>> [Wed Jan 28 13:35:11 2015] [info] mod_wsgi (pid=29764): Starting
>>>> process 'localhost:8000' with threads=5.
>>>>
>>>> So the shutdown phase specified by shutdown-timeout is subsequent to
>>>> eviction-timeout. It is one last chance to shutdown during a time that no
>>>> new requests are accepted in case it is the constant flow of requests that
>>>> is preventing it, rather than one long running request.
>>>>
>>>> The shutdown-timeout should always be kept quite short because no new
>>>> requests will be accepted during that time. So changing it from the default
>>>> isn't something one would normally do.
>>>>
>>>> Graham
>>>>
>>>> On 28/01/2015, at 3:02 AM, Kent <[email protected]> wrote:
>>>>
>>>> Let me be more specific.  I'm having a hard time getting this to test
>>>> as I expected.  Here is my WSGIDaemonProcess directive:
>>>>
>>>> WSGIDaemonProcess rarch processes=3 threads=2 inactivity-timeout=1800
>>>> display-name=%{GROUP} *graceful-timeout=140 eviction-timeout=60*
>>>> python-eggs=/home/rarch/tg2env/lib/python-egg-cache
>>>>
>>>> I put a 120 sec sleep in one of the processes' requests and then
>>>> SIGUSR1 (Linux) all three processes.  The two inactive ones immediately
>>>> restart, as I expect.  However, the 3rd (sleeping) one is allowed to run
>>>> past the 60 second eviction_timeout and runs straight to
>>>> the graceful_timeout before it is terminated.  Shouldn't it have been
>>>> killed at 60 sec?
>>>>
>>>> (And then, as my previous question, how does shutdown-timeout factor
>>>> into all this?)
>>>>
>>>> Thanks again!
>>>> Kent
>>>>
>>>>
>>>>
>>>> On Tuesday, January 27, 2015 at 9:34:12 AM UTC-5, Kent wrote:
>>>>
>>>> I think I might understand the difference between 'graceful-timeout'
>>>> and 'shutdown-timeout', but can you please just clarify the difference?
>>>> Are they additive?
>>>>
>>>> Also, will 'eviction-timeout' interact with either of those, or simply
>>>> override them?
>>>>
>>>> Thanks,
>>>> Kent
>>>>
>>>> On Monday, January 26, 2015 at 12:44:13 AM UTC-5, Graham Dumpleton
>>>> wrote:
>>>>
>>>> Want to give:
>>>>
>>>>     https://github.com/GrahamDumpleton/mod_wsgi/archive/develop.tar.gz
>>>>
>>>> a go?
>>>>
>>>> The WSGIDaemonProcess directive is 'eviction-timeout'. For
>>>> mod_wsgi-express the command line option is '--eviction-timeout'.
>>>>
>>>> So the terminology am using around this is that sending a signal is
>>>> like forcibly evicting the WSGI application, allow the process to be
>>>> restarted. At least this way can have an option name that is distinct
>>>> enough from generic 'restart' so as not to be confusing.
>>>>
>>>> Graham
>>>>
>>>> On 21/01/2015, at 11:15 PM, Kent <[email protected]> wrote:
>>>>
>>>>
>>>> On Tuesday, January 20, 2015 at 5:53:26 PM UTC-5, Graham Dumpleton
>>>> wrote:
>>>>
>>>>
>>>> On 20/01/2015, at 11:50 PM, Kent <[email protected]> wrote:
>>>>
>>>> On Sunday, January 18, 2015 at 12:43:08 AM UTC-5, Graham Dumpleton
>>>> wrote:
>>>>
>>>> There are a few possibilities here of how this could be
>>>> enhanced/changed.
>>>>
>>>> The problem with maximum-requests is that it can be dangerous. People
>>>> can set it too low and when their site gets a big spike of traffic then the
>>>> processes can be restarted too quickly only adding to the load of the site
>>>> and causing things to slow down and hamper their ability to handle the
>>>> spike. This is where setting a longer amount of time for graceful-timeout
>>>> helps because you can set it to be quite large. The use of maximum-requests
>>>> can still be like using a hammer though, and one which can be applied
>>>> unpredictably.
>>>>
>>>>
>>>> Yes, I can see that. (It may be overkill, but you could default a
>>>> separate minimum-lifetime parameter so only users who specifically mess
>>>> with that as well as maximum-requests shoot themselves in the foot, but it
>>>> is starting to get confusing with all the different timeouts, I'll agree
>>>> there...)
>>>>
>>>>
>>>>
>>>> The minimum-lifetime option is an interesting idea. It may have to do
>>>> nothing by default to avoid conflicts with existing expected behaviour.
>>>>
>>>>
>>>> The maximum-requests option also doesn't help in the case where you are
>>>> running background threads which do stuff and it is them and not the number
>>>> of requests coming in that dictate things like memory growth that you want
>>>> to counter.
>>>>
>>>>
>>>> True, but solving with maximum lifetime... well, actually, solving
>>>> memory problems with *any *of these mechanisms isn't measuring the
>>>> heart of the problem, which is RAM.  I imagine there isn't a good way to
>>>> measure RAM or you would have added that option by now.  Seems what we are
>>>> truly after for the majority of these isn't how many requests or how log
>>>> its been up, etc, but how much RAM it is taking (or perhaps, optionally,
>>>> average RAM per thread, instead).  If my process exceeds consuming 1.5GB,
>>>> then trigger a graceful restart at the next appropriate convenience, being
>>>> gentle to existing requests.  That may be arguably the most useful
>>>> parameter.
>>>>
>>>>
>>>> The problem with calculating memory is that there isn't one cross
>>>> platform portable way of doing it. On Linux you have to dive into the /proc
>>>> file system. On MacOS X you can use C API calls. On Solaris I think you
>>>> again need to dive into a /proc file system but it obviously has a
>>>> different file structure for getting details out compared to Linux. Adding
>>>> such cross platform stuff in gets a bit messy.
>>>>
>>>> What I was moving towards as an extension of the monitoring stuff I am
>>>> doing for mod_wsgi was to have a special daemon process you can setup which
>>>> has access to some sort of management API. You could then create your own
>>>> Python script that runs in that and which using the management API can get
>>>> daemon process pids and then use Python psutil to get memory usage on
>>>> periodic basis and then you decide if process should be restarted and send
>>>> it a signal to stop, or management API provided which allows you to notify
>>>> in some way, maybe by signal, or maybe using shared memory flag, that
>>>> daemon process should shut down.
>>>>
>>>>
>>>> I figured there was something making that a pain...
>>>>
>>>>
>>>> So the other option I have contemplated adding a number of times is is
>>>> one to periodically restart the process. The way this would work is that a
>>>> process restart would be done periodically based on what time was
>>>> specified. You could therefore say the restart interval was 3600 and it
>>>> would restart the process once an hour.
>>>>
>>>> The start of the time period for this would either be, when the process
>>>> was created, if any Python code or a WSGI script was preloaded at process
>>>> start time. Or, it would be from when the first request arrived if the WSGi
>>>> application was lazily loaded. This restart-interval could be tied to the
>>>> graceful-timeout option so that you can set and extended period if you want
>>>> to try and ensure that requests are not interrupted.
>>>>
>>>>
>>>> We just wouldn't want it to die having never even served a single
>>>> request, so my vote would be *against *the birth of the process as the
>>>> beginning point (and, rather, at first request).
>>>>
>>>>
>>>> It would effectively be from first request if lazily loaded. If
>>>> preloaded though, as background threads could be created which do stuff and
>>>> consume memory over time, would then be from when process started, ie.,
>>>> when Python code was preloaded.
>>>>
>>>>
>>>> But then for preloaded, processes life-cycle themselves for no reason
>>>> throughout inactive periods like maybe overnight.  That's not the end of
>>>> the world, but I wonder if we're catering to the wrong design. (These are,
>>>> after all, webserver processes, so it seems a fair assumption that they
>>>> exist primarily to handle requests, else why even run under apache?)  My
>>>> vote, for what it's worth, would still be timed from first request, but I
>>>> probably won't use that particular option.  Either way would be useful for
>>>> some I'm sure.
>>>>
>>>>
>>>>
>>>> Now we have the ability to sent the process graceful restart signal
>>>> (usually SIGUSR1), to force an individual process to restart.
>>>>
>>>> Right now this is tied to the graceful-timeout duration as well, which
>>>> as you point out, would perhaps be better off having its own time duration
>>>> for the notional grace period.
>>>>
>>>> Using the name restart-timeout for this could be confusing if I have a
>>>> restart interval option.
>>>>
>>>>
>>>> In my opinion, SIGUSR1 is different from the automatic parameters
>>>> because it was (most likely) triggered by user intervention, so that one
>>>> should ideally have its own parameter.  If that is the case and this
>>>> parameter becomes dedicated to SIGUSR1, then the least ambiguous name I can
>>>> think of is *sigusr1-timeout*.
>>>>
>>>>
>>>>
>>>> Except that it isn't guaranteed to be called SIGUSR1. Technically it
>>>> could be a different signal dependent on platform that Apache runs as. But
>>>> then, as far as I know all UNIX systems do use SIGUSR1.
>>>>
>>>>
>>>> In any case, they are "signals": you like *signal-timeout?* (Also
>>>> could be taken ambiguously, but maybe less so than restart-timeout?)
>>>>
>>>>
>>>> I also have another type of process restart I am trying to work out how
>>>> to accommodate and the naming of options again complicates the problem. In
>>>> this case we want to introduce an artificial restart delay.
>>>>
>>>> This would be an option to combat the problem which is being caused by
>>>> Django 1.7 in that WSGI script file loading for Django isn't stateless. If
>>>> a transient problem occurs, such as the database not being ready, the
>>>> loading of the WSGI script file can fail. On the next request an attempt is
>>>> made to load it again but now Django kicks a stink because it was half way
>>>> setting things up last time when it failed and the setup code cannot be run
>>>> a second time. The result is that the process then keeps failing.
>>>>
>>>> The idea of the restart delay option therefore is to allow you to set
>>>> it to number of seconds, normally just 1. If set like that, if a WSGI
>>>> script file import fails, it will effectively block for the delay specified
>>>> and when over it will kill the process so the whole process is thrown away
>>>> and the WSGI script file can be reloaded in a fresh process. This gets rid
>>>> of the problem of Django initialisation not being able to be retried.
>>>>
>>>>
>>>> (We are using turbogears... I don't think I've seen something like that
>>>> happen, but rarely have seen start up anomalies.)
>>>>
>>>>
>>>> A delay is needed to avoid an effective fork bomb, where a WSGI script
>>>> file not loading with high request throughput would cause a constant cycle
>>>> of processes dying and being replaced. It is possible it wouldn't be as bad
>>>> as I think as Apache only checks for dead processes to replace once a
>>>> second, but still prefer my own failsafe in case that changes.
>>>>
>>>> I am therefore totally fine with a separate graceful time period for
>>>> when SIGUSR1 is used, I just need to juggle these different features and
>>>> come up with an option naming scheme that make sense.
>>>>
>>>> How about then that I add the following new options:
>>>>
>>>>     maximum-lifetime - Similar to maximum-requests in that it will
>>>> cause the processes to be shutdown and restarted, but in this case it will
>>>> occur based on the time period given as argument, measured from the first
>>>> request or when the WSGI script file or any other Python code was
>>>> preloaded, that is, in the latter case when the process was started.
>>>>
>>>>     restart-timeout - Specifies a separate grace period for when the
>>>> process is being forcibly restarted using the graceful restart signal. If
>>>> restart-timeout is not specified and graceful-timeout is specified, then
>>>> the value of graceful-timeout is used. If neither are specified, then the
>>>> restart signal will be have similar to the process being sent a SIGINT.
>>>>
>>>>     linger-timeout - When a WSGI script file, of other Python code is
>>>> being imported by mod_wsgi directly, if that fails the default is that the
>>>> error is ignored. For a WSGI script file reloading will be attempted on the
>>>> next request. But if preloading code then it will fail and merely be
>>>> logged. If linger-timeout is specified to a non zero value, with the value
>>>> being seconds, then the daemon process will instead be shutdown and
>>>> restarted to try and allow a successful reloading of the code to occur if
>>>> it was a transient issue. To avoid a fork bomb if a persistent issue, a
>>>> delay will be introduced based on the value of the linger-timeout option.
>>>>
>>>>
>>>> How does that all sound, if it makes sense that is. :-)
>>>>
>>>>
>>>>
>>>> That sounds absolutely great!  How would I get on the notification cc:
>>>> of the ticket or whatever so I'd be informed of progress on that?
>>>>
>>>>
>>>> These days my turn around time is pretty quick so long as I am happy
>>>> and know what to change and how. So I just need to think a bit more about
>>>> it and gets some day job stuff out of the way before I can do something.
>>>>
>>>> So don't be surprised if you simply get a reply to this email within a
>>>> week pointing at a development version to try.
>>>>
>>>>
>>>> Well tons of thanks again.
>>>>
>>>>
>>>> Graham
>>>>
>>>> Graham
>>>>
>>>>
>>>>
>>>> On 17/01/2015, at 12:27 AM, Kent <[email protected]> wrote:
>>>>
>>>> Thanks again.  Yes, I did take our current version from the repo
>>>> because you hadn't released the SIGUSR1 bit yet...  I should upgrade now.
>>>>
>>>> As for the very long graceful-timeout, I was skirting around that
>>>> solution because I like where the setting is currently for SIGUSR1.  I
>>>> would like to ask, "Is there a way to indicate a different graceful-timeout
>>>> for handling SIGUSR1 vs. maximum-requests?" but I already have the
>>>> answer from the release notes: "No."
>>>>
>>>> I don't know if you can see the value in distinguishing the two, but 
>>>> maximum-requests
>>>> is sort of a "standard operating mode," so it might make sense for a
>>>> modwsgi user to want a higher, very gentle mode of operation there, whereas
>>>> SIGUSR1, while beautifully more graceful than SIGKILL, still "means
>>>> business," so the same user may want a shorter responsive timeout there
>>>> (while still allowing a decent chunk of time for being graceful to running
>>>> requests).   That is the case for me at least.  Any chance you'd entertain
>>>> that as a feature request?
>>>>
>>>> Even if not, you've been extremely helpful, thank you!  And thanks for
>>>> pointing me to the correct version of documentation: I thought I was
>>>> reading current version.
>>>> Kent
>>>>
>>>> P.S. I also have ideas for possible vertical URL partitioning, but
>>>> unfortunately, our app has much cross-over by URL, so that's why I'm down
>>>> this maximum-requests path...
>>>>
>>>>
>>>> On Friday, January 16, 2015 at 4:54:50 AM UTC-5, Graham Dumpleton wrote:
>>>>
>>>>
>>>> On 16/01/2015, at 7:28 AM, Kent <[email protected]> wrote:
>>>>
>>>> I'm running 4 (a very early version of it, possibly before you
>>>> officially released it).   We upgraded to take advantage of the
>>>> amazingly-helpful SIGUSR1 signaling for graceful process restarting,
>>>> which we use somewhat regularly to gracefully deploy software changes
>>>> (minor ones which won't matter if 2 processes have different versions
>>>> loaded) without disrupting users.  Thanks a ton for that!
>>>>
>>>>
>>>> SIGUSR1 support was released in version 4.1.0.
>>>>
>>>>     http://modwsgi.readthedocs.org/en/master/release-notes/
>>>> version-4.1.0.html
>>>>
>>>> That same version has all the other stuff which was changed so long as
>>>> using the actual 4.1.0 is being used and you aren't still using the repo
>>>> from before the official release.
>>>>
>>>> If not sure, best just upgrading to latest version if you can.
>>>>
>>>> We are also multi-threading our processes (plural processes, plural
>>>> threads).
>>>>
>>>> Some requests could be (validly) running for very long periods of time
>>>> (database reporting, maybe even half hour, though that would be very
>>>> extreme).
>>>>
>>>> Some processes (especially those generating .pdfs, for example), hog
>>>> tons of RAM, as you know, so I'd like these to eventually check their RAM
>>>> back in, so to speak, by utilizing either inactivity-timeout or
>>>> maximum-requests, but always in a very gentle way, since, as I
>>>> mentioned, some requests might be properly running, even though for many
>>>> minutes.  maximum-requests seems too brutal for my use-case since the
>>>> threshold request sends the process down the 
>>>> graceful-timeout/shutdown-timeout,
>>>> even if there are valid processes running and then SIGKILLs.  My ideal
>>>> vision of "maximum-requests," since it is *primarily for memory
>>>> management,* is to be very gentle, sort of a "ok, now that I've hit my
>>>> threshold, at my next earliest convenience, I should die, but only once all
>>>> my current requests have ended of their own accord."
>>>>
>>>>
>>>> That is where if you vertically partition those URLs out to a separate
>>>> daemon process group, you can simply set a very hight graceful-timeout
>>>> value.
>>>>
>>>> So relying on the feature:
>>>>
>>>> """
>>>> 2. Add a graceful-timeout option to WSGIDaemonProcess. This option is
>>>> applied in a number of circumstances.
>>>>
>>>> When maximum-requests and this option are used together, when maximum
>>>> requests is reached, rather than immediately shutdown, potentially
>>>> interupting active requests if they don’t finished with shutdown timeout,
>>>> can specify a separate graceful shutdown period. If the all requests are
>>>> completed within this time frame then will shutdown immediately, otherwise
>>>> normal forced shutdown kicks in. In some respects this is just allowing a
>>>> separate shutdown timeout on cases where requests could be interrupted and
>>>> could avoid it if possible.
>>>> """
>>>>
>>>> You could set:
>>>>
>>>>     maximum-requests=20 graceful-timeout=600
>>>>
>>>> So as soon as it hits 20 requests, it switches to a mode where it will
>>>> when no requests, restart. You can set that timeout as high as you want,
>>>> even hours, and it will not care.
>>>>
>>>> "inactivity-timeout" seems to function exactly as I want in that it
>>>> seems like it won't ever kill a process with a thread with an active
>>>> request (at least, I can't get it too even by adding a long import
>>>> time;time.sleep(longtime)... it doesn't seem to die until the request
>>>> is finished.  But that's why the documentation made me nervous because it
>>>> implies that it *could, *in fact, kill a proc with an active request: *"For
>>>> the purposes of this option, being idle means no new requests being
>>>> received, or no attempts by current requests to read request content or
>>>> generate response content for the defined period."  *
>>>>
>>>>
>>>> The release notes for 4.1.0 say:
>>>>
>>>> """
>>>> 4. The inactivity-timeout option to WSGIDaemonProcess now only results
>>>> in the daemon process being restarted after the idle timeout period where
>>>> there are no active requests. Previously it would also interrupt a long
>>>> running request. See the new request-timeout option for a way of
>>>> interrupting long running, potentially blocked requests and restarting the
>>>> process.
>>>> """
>>>>
>>>> I'd rather have a more gentle "maximum-requests" than
>>>> "inactivity-timeout" because then, even on very heavy days (when RAM is
>>>> most likely to choke), I could gracefully turn over these processes a
>>>> couple times a day, which I couldn't do with "inactivity-timeout" on an
>>>> extremely heavy day.
>>>>
>>>> Hope this makes sense.  I'm really asking :
>>>>
>>>>    1. whether inactivity-timeout triggering will ever SIGKILL a
>>>>    process with an active request, as the docs intimate
>>>>
>>>> No from 4.1.0 onwards.
>>>>
>>>>
>>>>    1. whether there is any way to get maximum-requests to behave more
>>>>    gently under all circumstances
>>>>
>>>> By setting a very very long graceful-timeout.
>>>>
>>>>
>>>>    1. for your ideas/best advice
>>>>
>>>> Have a good read through the release notes for 4.1.0.
>>>>
>>>> Not necessarily useful in your case, but also look at request-timeout.
>>>> It can act as a final fail safe for when things are stuck, but since it is
>>>> more of a fail safe, it doesn't make use of graceful-timeout.
>>>>
>>>> Graham
>>>>
>>>>
>>>> Thanks for your help!
>>>>
>>>>
>>>>
>>>> On Wednesday, January 14, 2015 at 9:48:02 PM UTC-5, Graham Dumpleton
>>>> wrote:
>>>>
>>>>
>>>> On 15/01/2015, at 8:32 AM, Kent <[email protected]> wrote:
>>>>
>>>> > Graham, the docs state: "For the purposes of this option, being idle
>>>> means no new requests being received, or no attempts by current requests to
>>>> read request content or generate response content for the defined period."
>>>>
>>>> >
>>>> > This implies to me that a running request that is taking a long time
>>>> could actually be killed as if it were idle (suppose it were fetching a
>>>> very slow database query).  Is this the case?
>>>>
>>>> This is the case for mod_wsgi prior to version 4.0.
>>>>
>>>> Things have changed in mod_wsgi 4.X.
>>>>
>>>> How long are your long running requests though? The inactivity-timeout
>>>> was more about restarting infrequently used applications so that memory can
>>>> be taken back.
>>>>
>>>>
>>>>
>>>>
>>>> > Also, I'm looking for an ultra-conservative and graceful method of
>>>> recycling memory. I've read your article on url partitioning, which was
>>>> useful, but sooner or later, one must rely on either inactivity-timeout or
>>>> maximum-requests, is that accurate?  But both these will eventually, after
>>>> graceful timeout/shutdown timeout, potentially kill active requests.  It is
>>>> valid for our app to handle long-running reports, so I was hoping for an
>>>> ultra-safe mechanism.
>>>> > Do you have any advice here?
>>>>
>>>> The options available in mod_wsgi 4.X are much better in this area than
>>>> 3.X. The changes in 4.X aren't covered in main documentation though and are
>>>> only described in the release notes where change was made.
>>>>
>>>> In 4.X the concept of an inactivity-timeout is now separate to the idea
>>>> of a request-timeout. There is also a graceful-timeout that can be applied
>>>> to maximum-requests and some other situations as well to allow requests to
>>>> finish out properly before being more brutal. One can also signal the
>>>> daemon processes to do a more graceful restart as well.
>>>>
>>>> You cannot totally avoid having to be brutal though and kill things
>>>> else you don't have a fail safe for a stuck process where all request
>>>> threads were blocked on back end services and were never going to recover.
>>>> Use of multithreading in a process also complicates the implementation of
>>>> request-timeout.
>>>>
>>>> Anyway, the big question is what version are you using?
>>>>
>>>> Graham
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "modwsgi" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/modwsgi.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "modwsgi" group.
>>>> To unsubscribe from this group and stop re
>>>>
>>>> ...
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "modwsgi" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/modwsgi.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "modwsgi" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/modwsgi/84yzDAMFRsw/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/modwsgi.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/modwsgi.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "modwsgi" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/modwsgi/84yzDAMFRsw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/modwsgi.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Re: [modwsgi] inactivity-timeout

Reply via email to