The application of the eviction timeout should not be fixed in develop branch.
https://github.com/GrahamDumpleton/mod_wsgi/archive/develop.tar.gz
Graham
On 03/02/2015, at 5:02 PM, Graham Dumpleton <[email protected]> wrote:
>
>
> On 3 February 2015 at 04:15, Kent Bower <[email protected]> wrote:
> On Sun, Feb 1, 2015 at 7:08 PM, Graham Dumpleton <[email protected]>
> wrote:
> Your Flask client doesn't need to know about Celery, as your web application
> accepts requests as normal and it is your Python code which would queue the
> job with Celery.
>
> Now looking back, the only configuration I can find, but which I don't know
> if it is your actual production configuration is:
>
> WSGIDaemonProcess rarch processes=3 threads=2 inactivity-timeout=1800
> display-name=%{GROUP} graceful-timeout=140 eviction-timeout=60
> python-eggs=/home/rarch/tg2env/lib/python-egg-cache
>
> Provided that you don't then start to have overall host memory issues, the
> simplest way around this whole issue is not to use a multithreaded process.
>
> What you would do is vertically partition your URL name space so that just
> the URLs which do the long running report generation would be delegated to
> single threaded processes. Everything else would keep going to the
> multithread processes.
>
> WSGIDaemonProcess rarch processes=3 threads=2
> WSGIDaemonProvess rarch-long-running processes=6 threads=1
> maximum-requests=20
>
> WSGIProcessGroup rarch
>
> <Location /suburl/of/long/running/report/generator>
> WSGIProcessGroup rarch-long-running
> </Location>
>
> You wouldn't even have to worry about the graceful-timeout on
> rarch-long-running as that is only relevant for maxiumum-requests where it is
> a multithreaded processes.
>
> So what would happen is that when the request has finished, if
> maximum-requests is reached, the process would be restarted even before any
> new request was accepted by the process, so there is no chance of a new
> request being interrupted.
>
> You could still set an eviction-timeout of some suitably large value to allow
> you to use SIGUSR1 to be sent to processes in that daemon process group to
> shut them down.
>
> In this case, having eviction-timeout being able to be set independent of
> graceful-timeout (for maximum-requests), is probably useful and so I will
> retain the option.
>
> So is there any reason you couldn't use a daemon process group with many
> single threaded process instead?
>
>
> This is very good to know (that single threaded procs would behave more
> ideally in these circumstances). The above was just my configuration for
> testing 'eviction-timeout'. Our software generally runs with many more
> processes and threads, on servers with maybe 16 or 32 GB RAM. And
> unfortunately, the RAM is the limiting resource here as our python app, built
> on turbo-gears, is a memory hog and we have yet to find the resources to
> dissect that. I was aiming to head in the direction of URL partitioning, but
> there are big obstacles. (Chiefly, RAM consumption would make threads=1 and
> yet more processes very difficult unless we spend the huge effort in
> dissecting the app to locate and pull the many unused memory hogging
> libraries out.)
>
> So, URL partitioning is sort of the ideal, distant solution, as well as a
> Celery-like polling solution, but out of my reach for now.
>
> Have you ever run a test where you compare the whole memory usage of your
> application where all URLs are visited, to how much memory is used if only
> the URL which generates the long running report is visited?
>
> In Django at least, a lot of stuff is lazily loaded only when a URL requiring
> it is first accessed. So even with a heavy code base, there can still be
> benefits in splitting out URLs to their own processes because the whole code
> base wouldn't be loaded due to the lazy loading.
>
> So do you have any actual memory figures from doing that?
>
> How many URLs are there that generates these reports vs those that don't, or
> is that all the whole application does?
>
> Are your most frequently visited URLs those generating the reports or
> something else?
>
> Another question for multithreaded graceful-timeout with maximum-requests:
> during a period of heavy traffic, it seems the graceful-timeout setting just
> pushes the real timeout until shutdown-timeout because, if heavy enough,
> you'll be getting requests during graceful-timeout. That diminishes the
> fidelity of "graceful-timeout." Do you see where I'm coming from (even if
> you're happy with the design and don't want to mess with it, which I'd
> understand)?
>
>
> Ok, here is the log demonstrating the troubles I saw with eviction-timeout.
> For demonstration purposes, here is the simplified directive I'm using:
>
> WSGIDaemonProcess rarch processes=1 threads=1 display-name=%{GROUP}
> graceful-timeout=140 eviction-timeout=60
> python-eggs=/home/rarch/tg2env/lib/python-egg-cache
>
> Here is the log:
>
> [Mon Feb 02 11:36:16 2015] [info] Init: Initializing (virtual) servers for SSL
> [Mon Feb 02 11:36:16 2015] [info] Server: Apache/2.2.3, Interface:
> mod_ssl/2.2.3, Library: OpenSSL/0.9.8e-fips-rhel5
> [Mon Feb 02 11:36:16 2015] [notice] Digest: generating secret for digest
> authentication ...
> [Mon Feb 02 11:36:16 2015] [notice] Digest: done
> [Mon Feb 02 11:36:16 2015] [info] APR LDAP: Built with OpenLDAP LDAP SDK
> [Mon Feb 02 11:36:16 2015] [info] LDAP: SSL support available
> [Mon Feb 02 11:36:16 2015] [info] Init: Seeding PRNG with 256 bytes of entropy
> [Mon Feb 02 11:36:16 2015] [info] Init: Generating temporary RSA private keys
> (512/1024 bits)
> [Mon Feb 02 11:36:16 2015] [info] Init: Generating temporary DH parameters
> (512/1024 bits)
> [Mon Feb 02 11:36:16 2015] [info] Shared memory session cache initialised
> [Mon Feb 02 11:36:16 2015] [info] Init: Initializing (virtual) servers for SSL
> [Mon Feb 02 11:36:16 2015] [info] Server: Apache/2.2.3, Interface:
> mod_ssl/2.2.3, Library: OpenSSL/0.9.8e-fips-rhel5
> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447): Starting process
> 'rarch' with uid=48, gid=48 and threads=1.
> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447): Python home
> /home/rarch/tg2env.
> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447): Initializing Python.
> [Mon Feb 02 11:36:16 2015] [notice] Apache/2.2.3 (CentOS) configured --
> resuming normal operations
> [Mon Feb 02 11:36:16 2015] [info] Server built: Aug 30 2010 12:28:40
> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447): Attach interpreter ''.
> [Mon Feb 02 11:36:16 2015] [info] mod_wsgi (pid=29447, process='rarch',
> application=''): Loading WSGI script
> '/home/rarch/trunk/src/appserver/wsgi-config/wsgi-deployment.py'.
> [Mon Feb 02 11:39:13 2015] [info] mod_wsgi (pid=29447): Process eviction
> requested, waiting for requests to complete 'rarch'.
> [Mon Feb 02 11:41:00 2015] [info] mod_wsgi (pid=29447): Daemon process
> graceful timer expired 'rarch'.
> [Mon Feb 02 11:41:00 2015] [info] mod_wsgi (pid=29447): Shutdown requested
> 'rarch'.
> [Mon Feb 02 11:41:05 2015] [info] mod_wsgi (pid=29447): Aborting process
> 'rarch'.
> [Mon Feb 02 11:41:05 2015] [info] mod_wsgi (pid=29447): Exiting process
> 'rarch'.
> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=29447): Process 'rarch' has
> died, deregister and restart it.
> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=29447): Process 'rarch' has
> been deregistered and will no longer be monitored.
> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331): Starting process
> 'rarch' with uid=48, gid=48 and threads=1.
> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331): Python home
> /home/rarch/tg2env.
> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331): Initializing Python.
> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331): Attach interpreter ''.
> [Mon Feb 02 11:41:06 2015] [info] mod_wsgi (pid=31331, process='rarch',
> application=''): Loading WSGI script
> '/home/rarch/trunk/src/appserver/wsgi-config/wsgi-deployment.py'.
>
> The process was signaled at 11:39:13 with eviction-timeout=60 but 11:40:13
> came and passed and nothing happened until 107 seconds passed, at which time
> graceful timer expired.
>
>
> Next, I changed the parameters a little:
>
> WSGIDaemonProcess rarch processes=1 threads=1 display-name=%{GROUP}
> eviction-timeout=30 graceful-timeout=240
> python-eggs=/home/rarch/tg2env/lib/python-egg-cache
>
> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381): Starting process
> 'rarch' with uid=48, gid=48 and threads=1.
> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381): Python home
> /home/rarch/tg2env.
> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381): Initializing Python.
> [Mon Feb 02 12:06:57 2015] [notice] Apache/2.2.3 (CentOS) configured --
> resuming normal operations
> [Mon Feb 02 12:06:57 2015] [info] Server built: Aug 30 2010 12:28:40
> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381): Attach interpreter ''.
> [Mon Feb 02 12:06:57 2015] [info] mod_wsgi (pid=3381, process='rarch',
> application=''): Loading WSGI script
> '/home/rarch/trunk/src/appserver/wsgi-config/wsgi-deployment.py'.
> [Mon Feb 02 12:07:19 2015] [info] mod_wsgi (pid=3381): Process eviction
> requested, waiting for requests to complete 'rarch'.
> [Mon Feb 02 12:11:01 2015] [info] mod_wsgi (pid=3381): Daemon process
> graceful timer expired 'rarch'.
> [Mon Feb 02 12:11:01 2015] [info] mod_wsgi (pid=3381): Shutdown requested
> 'rarch'.
> [Mon Feb 02 12:11:06 2015] [info] mod_wsgi (pid=3381): Aborting process
> 'rarch'.
> [Mon Feb 02 12:11:06 2015] [info] mod_wsgi (pid=3381): Exiting process
> 'rarch'.
> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=3381): Process 'rarch' has
> died, deregister and restart it.
> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=3381): Process 'rarch' has
> been deregistered and will no longer be monitored.
> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028): Starting process
> 'rarch' with uid=48, gid=48 and threads=1.
> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028): Python home
> /home/rarch/tg2env.
> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028): Initializing Python.
> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028): Attach interpreter ''.
> [Mon Feb 02 12:11:07 2015] [info] mod_wsgi (pid=7028, process='rarch',
> application=''): Loading WSGI script
> '/home/rarch/trunk/src/appserver/wsgi-config/wsgi-deployment.py'.
>
>
> So, for me, eviction-timeout is apparently being ignored...
>
> The background monitor thread which monitors for expiry wasn't taking into
> consideration the eviction timeout period being able to be less than the
> graceful timeout. I didn't see a problem as I was also setting request
> timeout, which causes the way the monitor thread works to be different,
> waking up every second regardless. I will work on a fix for that.
>
> Another issue for consideration is if a graceful timeout is already in
> progress and a signal comes in for eviction, which timeout wins? Right now
> the eviction time will trump the graceful time if already set by maximum
> requests. The converse isn't true though in that if already in eviction cycle
> and maximum requests arrives, it wouldn't be trumped by graceful timeout. So
> eviction time had authority given that it was triggered by explicit user
> signal. It does mean that the signal could effectively extend what ever
> graceful time was in progress.
>
> Graham
>
> Thanks again for all your time and help,
> Kent
>
>
>
> Note that since only a sub set of URLs would go to the daemon process group,
> the memory usage profile will change as you aren't potentially loading the
> complete application code into those processes and only those needed for that
> URL and that report. So it could use up less memory than application as a
> whole, allowing you to have multiple single threaded processes with no issue.
>
> Graham
>
> On 31/01/2015, at 12:31 AM, Kent <[email protected]> wrote:
>
>> Thanks for your reply and recommendations. We're aware of the issues, but I
>> didn't give the full picture for brevity's sake. The reports are user
>> generated reports. Ultimately, the users know whether the reports should
>> return quickly (which many, many will), or whether they are long-running.
>> There is no way for the application to know that, so to avoid some sort of
>> polling (which we've done in the past and was a pain in the rear to users),
>> the design is to allow the user to decide whether to run the report in the
>> background or "foreground" via a check box. Since most reports will return
>> in a matter of a minute or so, we wanted to avoid the pain of making them
>> poll, but I need to look at Celery. However, I'm not comfortable punishing
>> users for accidentally choosing foreground on a long-running report. That
>> is, not for an automatic turn-over mechanism like maximum-requests or
>> inactivity-timeout. In my mind, those are inherently different than
>> something like a SIGUSR1 mechanism because the former are automatic.
>>
>> So, while admitting there are edge cases we are using that don't have a
>> perfect solution (or even admitting we need a better mechanism in that
>> case), it still seems to me mod_wsgi should be somewhat agnostic of design
>> choices. In other words, when it comes to automatic turning over of
>> processes, it seems mod_wsgi shouldn't be involved with length of time
>> considerations, except to allow the user to specify timeouts. See, the long
>> running reports are only one of my concerns: we also fight with database
>> locks sometimes, held by another application attached to the same database
>> and wholly out of our control. Sometimes those locks can be held for many
>> minutes on a request that normally should complete within seconds. There
>> too, it seems mod_wsgi should be very gentle in the automatic turnover cases.
>>
>> Thanks for pointing to Celery. I really wonder whether I can get a message
>> broker to work with Adobe Flash, our current client, but I haven't looked
>> into this much yet.
>>
>> Also, my apologies if you believe this to have been a waste of time on your
>> part. You've been extremely helpful, though and I'm quite thankful for your
>> time! I understand you not wanting to redesign the shutdown-timeout thing
>> and mess with what otherwise isn't broken. Would you still like me to post
>> the apache debug logs regarding 'eviction-timeout' or have you changed your
>> mind about releasing that? (In which case, extra apologies.)
>>
>> Kent
>>
>>
>>
>>
>> On Friday, January 30, 2015 at 6:34:28 AM UTC-5, Graham Dumpleton wrote:
>> If you have web requests generating reports which take 40 minutes to run,
>> you are going the wrong way about it.
>>
>> What would be regarded as best practice for long running requests is to use
>> a task queuing system to queue up the task to be run and run it in a
>> distinct set of processes to the web server. Your web request can then
>> return immediately, with some sort of polling system used as necessary to
>> check the progress of the task and allow the result to be downloaded when
>> complete. By using a separate system to run the tasks, it doesn't matter
>> whether the web server is restarted as the tasks will still run and after
>> the web server is restarted, a user can still check on progress of the tasks
>> and get back his response.
>>
>> The most common such task execution system for doing this sort of thing is
>> Celery.
>>
>> So it is because you aren't using the correct tool for the job here that you
>> are fighting against things like timeouts in the web server. No web server
>> is really a suitable environment to be used as an in process task execution
>> system. The web server should handle requests quickly and offload longer
>> processing tasks a separate task system which is purpose built for handling
>> the management of long running tasks.
>>
>> I am not inclined to keep fiddling how the timeouts work now I understand
>> what you are trying to do. I am even questioning now whether I should have
>> introduced the separate eviction timeout I already did given that it is
>> turning out to be a questionable use case.
>>
>> I would really recommend you look at re-architecting how you do things. I
>> don't think I would have any trouble finding others on the list who would
>> advise the same thing and who could also give you further advice on using
>> something like Celery instead for task execution.
>>
>> Graham
>>
>> On 29/01/2015, at 7:30 AM, Kent <[email protected]> wrote:
>>
>> Ok, I plan to run those tests with debug and post, but please, in the
>> meantime:
>>
>> For our app, not interrupting existing requests is a higher priority than
>> being able to accept new requests, particularly since we typically run many
>> wsgi processes, each with a handful of threads. So, I'm not really
>> concerned about maintaining always available threads (statistically, I will
>> be fine... that isn't the issue for me).
>>
>> In these circumstances, it would be much better for all these triggering
>> events (SIGUSR1, maximum-requests, or inactivity-timeout, etc.) to
>> immediately stop accepting new requests and "concentrate" on shutting down.
>> (Unless that means requests waiting in apache are terminated because they
>> were queued for this particular process, but I doubt apache has already
>> determined the request's process if none are available, has it?) With high
>> graceful-timeout/eviction-timeout and low shutdown-timeout, I run a pretty
>> high risk of accepting a new request at the tail end of graceful-timeout or
>> eviction-timeout, only to have it basically doomed to ungraceful death
>> because many of our requests are long running (very often well over 5 or 10
>> sec).
>>
>> I guess that's why, through experimentation with SIGUSR1 a few years back, I
>> ended up "graceful-timeout=5 shutdown-timeout=300" ... the opposite of how
>> it would default, because this works well when trying to signal these to
>> recycle themselves: they basically immediately stop accepting new requests
>> so your "guaranteed" graceful timeout is 300. It seems I have no way to
>> "guarantee" a very large graceful timeout for each and every request, even
>> if affected by maximum-requests or inactivity-timeout, and specify a
>> different (lower) one for SIGUSR1 because the only truly guaranteed lifetime
>> in seconds is "shutdown-timeout," is that accurate?
>>
>> The ideal for our app, which may accept certain request that run for several
>> minutes is this:
>> if maximum-requests or inactivity-timeout is hit, stop taking new requests
>> immediately and shutdown as soon as possible, but give existing requests
>> basically all the time they need to finish (say, up to 40 minutes (for
>> long-running db reports)).
>> if SIGUSR1, stop taking new requests immediately and shutdown as soon as
>> possible, but give existing requests a really good chance to complete, maybe
>> 3-5 minutes, but not the 40 minutes, because this is slightly more urgent
>> (was triggered manually and a user is monitoring/waiting for turnover and
>> wants new code in place)
>> I don't think I can accomplish the above if I understand the design
>> correctly because a request may have been accepted at the tail end of
>> graceful-timeout/eviction-timeout and so is only guaranteed a lifetime of
>> shutdown-timeout, regardless of what the trigger was (SIGUSR1 vs. automatic).
>>
>> Is my understanding of this accurate?
>>
>>
>>
>> On Tuesday, January 27, 2015 at 9:48:01 PM UTC-5, Graham Dumpleton wrote:
>> Can you ensure that LogLevel is set to at least info and provide what
>> messages are in the Apache error log file
>>
>> If I use:
>>
>> $ mod_wsgi-express start-server hack/sleep.wsg--log-level=debug
>> --verbose-debugging --eviction-timeout 30 --graceful-timeout 60
>>
>> which is equivalent to:
>>
>> WSGIDaemonProcess … graceful-timeout=60 eviction-timeout=30
>>
>> and fire a request against application that sleeps a long time I see in the
>> Apache error logs at the time of the signal:
>>
>> [Wed Jan 28 13:34:34 2015] [info] mod_wsgi (pid=29639): Process eviction
>> requested, waiting for requests to complete 'localhost:8000'.
>>
>> At the end of the 30 seconds given by the eviction timeout I see:
>>
>> [Wed Jan 28 13:35:05 2015] [info] mod_wsgi (pid=29639): Daemon process
>> graceful timer expired 'localhost:8000'.
>> [Wed Jan 28 13:35:05 2015] [info] mod_wsgi (pid=29639): Shutdown requested
>> 'localhost:8000'.
>>
>> Up till that point the process would still have been accepting new requests
>> and was waiting for point that there was no active requests to allow it to
>> shutdown.
>>
>> As the timeout tripped at 30 seconds, it then instead goes into the more
>> brutal shutdown process. No new requests are accepted from this point.
>>
>> For my setup the shutdown-timeout defaults to 5 seconds and because the
>> request still hadn't completed within 5 seconds, then the process is exited
>> anyway and allowed to shutdown.
>>
>> [Wed Jan 28 13:35:10 2015] [info] mod_wsgi (pid=29639): Aborting process
>> 'localhost:8000'.
>> [Wed Jan 28 13:35:10 2015] [info] mod_wsgi (pid=29639): Exiting process
>> 'localhost:8000'.
>>
>> Because the application never returned a response, that results in the
>> Apache child worker who was trying to talk to the daemon process seeing a
>> truncated response.
>>
>> [Wed Jan 28 13:35:10 2015] [error] [client 127.0.0.1] Truncated or oversized
>> response headers received from daemon process 'localhost:8000':
>> /tmp/mod_wsgi-localhost:8000:502/htdocs/
>>
>> When the Apache parent process notices the daemon process has died, it
>> cleans up and starts a new one.
>>
>> [Wed Jan 28 13:35:11 2015] [info] mod_wsgi (pid=29639): Process
>> 'localhost:8000' has died, deregister and restart it.
>> [Wed Jan 28 13:35:11 2015] [info] mod_wsgi (pid=29639): Process
>> 'localhost:8000' has been deregistered and will no longer be monitored.
>> [Wed Jan 28 13:35:11 2015] [info] mod_wsgi (pid=29764): Starting process
>> 'localhost:8000' with threads=5.
>>
>> So the shutdown phase specified by shutdown-timeout is subsequent to
>> eviction-timeout. It is one last chance to shutdown during a time that no
>> new requests are accepted in case it is the constant flow of requests that
>> is preventing it, rather than one long running request.
>>
>> The shutdown-timeout should always be kept quite short because no new
>> requests will be accepted during that time. So changing it from the default
>> isn't something one would normally do.
>>
>> Graham
>>
>> On 28/01/2015, at 3:02 AM, Kent <[email protected]> wrote:
>>
>> Let me be more specific. I'm having a hard time getting this to test as I
>> expected. Here is my WSGIDaemonProcess directive:
>>
>> WSGIDaemonProcess rarch processes=3 threads=2 inactivity-timeout=1800
>> display-name=%{GROUP} graceful-timeout=140 eviction-timeout=60
>> python-eggs=/home/rarch/tg2env/lib/python-egg-cache
>>
>> I put a 120 sec sleep in one of the processes' requests and then SIGUSR1
>> (Linux) all three processes. The two inactive ones immediately restart, as
>> I expect. However, the 3rd (sleeping) one is allowed to run past the 60
>> second eviction_timeout and runs straight to the graceful_timeout before it
>> is terminated. Shouldn't it have been killed at 60 sec?
>>
>> (And then, as my previous question, how does shutdown-timeout factor into
>> all this?)
>>
>> Thanks again!
>> Kent
>>
>>
>>
>> On Tuesday, January 27, 2015 at 9:34:12 AM UTC-5, Kent wrote:
>> I think I might understand the difference between 'graceful-timeout' and
>> 'shutdown-timeout', but can you please just clarify the difference? Are
>> they additive?
>>
>> Also, will 'eviction-timeout' interact with either of those, or simply
>> override them?
>>
>> Thanks,
>> Kent
>>
>> On Monday, January 26, 2015 at 12:44:13 AM UTC-5, Graham Dumpleton wrote:
>> Want to give:
>>
>> https://github.com/GrahamDumpleton/mod_wsgi/archive/develop.tar.gz
>>
>> a go?
>>
>> The WSGIDaemonProcess directive is 'eviction-timeout'. For mod_wsgi-express
>> the command line option is '--eviction-timeout'.
>>
>> So the terminology am using around this is that sending a signal is like
>> forcibly evicting the WSGI application, allow the process to be restarted.
>> At least this way can have an option name that is distinct enough from
>> generic 'restart' so as not to be confusing.
>>
>> Graham
>>
>> On 21/01/2015, at 11:15 PM, Kent <[email protected]> wrote:
>>
>>
>> On Tuesday, January 20, 2015 at 5:53:26 PM UTC-5, Graham Dumpleton wrote:
>>
>> On 20/01/2015, at 11:50 PM, Kent <[email protected]> wrote:
>>
>> On Sunday, January 18, 2015 at 12:43:08 AM UTC-5, Graham Dumpleton wrote:
>> There are a few possibilities here of how this could be enhanced/changed.
>>
>> The problem with maximum-requests is that it can be dangerous. People can
>> set it too low and when their site gets a big spike of traffic then the
>> processes can be restarted too quickly only adding to the load of the site
>> and causing things to slow down and hamper their ability to handle the
>> spike. This is where setting a longer amount of time for graceful-timeout
>> helps because you can set it to be quite large. The use of maximum-requests
>> can still be like using a hammer though, and one which can be applied
>> unpredictably.
>>
>> Yes, I can see that. (It may be overkill, but you could default a separate
>> minimum-lifetime parameter so only users who specifically mess with that as
>> well as maximum-requests shoot themselves in the foot, but it is starting to
>> get confusing with all the different timeouts, I'll agree there...)
>>
>>
>> The minimum-lifetime option is an interesting idea. It may have to do
>> nothing by default to avoid conflicts with existing expected behaviour.
>>
>>
>> The maximum-requests option also doesn't help in the case where you are
>> running background threads which do stuff and it is them and not the number
>> of requests coming in that dictate things like memory growth that you want
>> to counter.
>>
>>
>> True, but solving with maximum lifetime... well, actually, solving memory
>> problems with any of these mechanisms isn't measuring the heart of the
>> problem, which is RAM. I imagine there isn't a good way to measure RAM or
>> you would have added that option by now. Seems what we are truly after for
>> the majority of these isn't how many requests or how log its been up, etc,
>> but how much RAM it is taking (or perhaps, optionally, average RAM per
>> thread, instead). If my process exceeds consuming 1.5GB, then trigger a
>> graceful restart at the next appropriate convenience, being gentle to
>> existing requests. That may be arguably the most useful parameter.
>>
>>
>> The problem with calculating memory is that there isn't one cross platform
>> portable way of doing it. On Linux you have to dive into the /proc file
>> system. On MacOS X you can use C API calls. On Solaris I think you again
>> need to dive into a /proc file system but it obviously has a different file
>> structure for getting details out compared to Linux. Adding such cross
>> platform stuff in gets a bit messy.
>>
>> What I was moving towards as an extension of the monitoring stuff I am doing
>> for mod_wsgi was to have a special daemon process you can setup which has
>> access to some sort of management API. You could then create your own Python
>> script that runs in that and which using the management API can get daemon
>> process pids and then use Python psutil to get memory usage on periodic
>> basis and then you decide if process should be restarted and send it a
>> signal to stop, or management API provided which allows you to notify in
>> some way, maybe by signal, or maybe using shared memory flag, that daemon
>> process should shut down.
>>
>>
>> I figured there was something making that a pain...
>>
>> So the other option I have contemplated adding a number of times is is one
>> to periodically restart the process. The way this would work is that a
>> process restart would be done periodically based on what time was specified.
>> You could therefore say the restart interval was 3600 and it would restart
>> the process once an hour.
>>
>> The start of the time period for this would either be, when the process was
>> created, if any Python code or a WSGI script was preloaded at process start
>> time. Or, it would be from when the first request arrived if the WSGi
>> application was lazily loaded. This restart-interval could be tied to the
>> graceful-timeout option so that you can set and extended period if you want
>> to try and ensure that requests are not interrupted.
>>
>> We just wouldn't want it to die having never even served a single request,
>> so my vote would be against the birth of the process as the beginning point
>> (and, rather, at first request).
>>
>>
>> It would effectively be from first request if lazily loaded. If preloaded
>> though, as background threads could be created which do stuff and consume
>> memory over time, would then be from when process started, ie., when Python
>> code was preloaded.
>>
>>
>> But then for preloaded, processes life-cycle themselves for no reason
>> throughout inactive periods like maybe overnight. That's not the end of the
>> world, but I wonder if we're catering to the wrong design. (These are, after
>> all, webserver processes, so it seems a fair assumption that they exist
>> primarily to handle requests, else why even run under apache?) My vote, for
>> what it's worth, would still be timed from first request, but I probably
>> won't use that particular option. Either way would be useful for some I'm
>> sure.
>>
>>
>> Now we have the ability to sent the process graceful restart signal (usually
>> SIGUSR1), to force an individual process to restart.
>>
>> Right now this is tied to the graceful-timeout duration as well, which as
>> you point out, would perhaps be better off having its own time duration for
>> the notional grace period.
>>
>> Using the name restart-timeout for this could be confusing if I have a
>> restart interval option.
>>
>>
>> In my opinion, SIGUSR1 is different from the automatic parameters because it
>> was (most likely) triggered by user intervention, so that one should ideally
>> have its own parameter. If that is the case and this parameter becomes
>> dedicated to SIGUSR1, then the least ambiguous name I can think of is
>> sigusr1-timeout.
>>
>>
>> Except that it isn't guaranteed to be called SIGUSR1. Technically it could
>> be a different signal dependent on platform that Apache runs as. But then,
>> as far as I know all UNIX systems do use SIGUSR1.
>>
>>
>> In any case, they are "signals": you like signal-timeout? (Also could be
>> taken ambiguously, but maybe less so than restart-timeout?)
>>
>> I also have another type of process restart I am trying to work out how to
>> accommodate and the naming of options again complicates the problem. In this
>> case we want to introduce an artificial restart delay.
>>
>> This would be an option to combat the problem which is being caused by
>> Django 1.7 in that WSGI script file loading for Django isn't stateless. If a
>> transient problem occurs, such as the database not being ready, the loading
>> of the WSGI script file can fail. On the next request an attempt is made to
>> load it again but now Django kicks a stink because it was half way setting
>> things up last time when it failed and the setup code cannot be run a second
>> time. The result is that the process then keeps failing.
>>
>> The idea of the restart delay option therefore is to allow you to set it to
>> number of seconds, normally just 1. If set like that, if a WSGI script file
>> import fails, it will effectively block for the delay specified and when
>> over it will kill the process so the whole process is thrown away and the
>> WSGI script file can be reloaded in a fresh process. This gets rid of the
>> problem of Django initialisation not being able to be retried.
>>
>>
>> (We are using turbogears... I don't think I've seen something like that
>> happen, but rarely have seen start up anomalies.)
>>
>> A delay is needed to avoid an effective fork bomb, where a WSGI script file
>> not loading with high request throughput would cause a constant cycle of
>> processes dying and being replaced. It is possible it wouldn't be as bad as
>> I think as Apache only checks for dead processes to replace once a second,
>> but still prefer my own failsafe in case that changes.
>>
>> I am therefore totally fine with a separate graceful time period for when
>> SIGUSR1 is used, I just need to juggle these different features and come up
>> with an option naming scheme that make sense.
>>
>> How about then that I add the following new options:
>>
>> maximum-lifetime - Similar to maximum-requests in that it will cause the
>> processes to be shutdown and restarted, but in this case it will occur based
>> on the time period given as argument, measured from the first request or
>> when the WSGI script file or any other Python code was preloaded, that is,
>> in the latter case when the process was started.
>>
>> restart-timeout - Specifies a separate grace period for when the process
>> is being forcibly restarted using the graceful restart signal. If
>> restart-timeout is not specified and graceful-timeout is specified, then the
>> value of graceful-timeout is used. If neither are specified, then the
>> restart signal will be have similar to the process being sent a SIGINT.
>>
>> linger-timeout - When a WSGI script file, of other Python code is being
>> imported by mod_wsgi directly, if that fails the default is that the error
>> is ignored. For a WSGI script file reloading will be attempted on the next
>> request. But if preloading code then it will fail and merely be logged. If
>> linger-timeout is specified to a non zero value, with the value being
>> seconds, then the daemon process will instead be shutdown and restarted to
>> try and allow a successful reloading of the code to occur if it was a
>> transient issue. To avoid a fork bomb if a persistent issue, a delay will be
>> introduced based on the value of the linger-timeout option.
>>
>> How does that all sound, if it makes sense that is. :-)
>>
>>
>>
>> That sounds absolutely great! How would I get on the notification cc: of
>> the ticket or whatever so I'd be informed of progress on that?
>>
>> These days my turn around time is pretty quick so long as I am happy and
>> know what to change and how. So I just need to think a bit more about it and
>> gets some day job stuff out of the way before I can do something.
>>
>> So don't be surprised if you simply get a reply to this email within a week
>> pointing at a development version to try.
>>
>>
>> Well tons of thanks again.
>>
>> Graham
>>
>> Graham
>>
>>
>> On 17/01/2015, at 12:27 AM, Kent <[email protected]> wrote:
>>
>> Thanks again. Yes, I did take our current version from the repo because you
>> hadn't released the SIGUSR1 bit yet... I should upgrade now.
>>
>> As for the very long graceful-timeout, I was skirting around that solution
>> because I like where the setting is currently for SIGUSR1. I would like to
>> ask, "Is there a way to indicate a different graceful-timeout for handling
>> SIGUSR1 vs. maximum-requests?" but I already have the answer from the
>> release notes: "No."
>>
>> I don't know if you can see the value in distinguishing the two, but
>> maximum-requests is sort of a "standard operating mode," so it might make
>> sense for a modwsgi user to want a higher, very gentle mode of operation
>> there, whereas SIGUSR1, while beautifully more graceful than SIGKILL, still
>> "means business," so the same user may want a shorter responsive timeout
>> there (while still allowing a decent chunk of time for being graceful to
>> running requests). That is the case for me at least. Any chance you'd
>> entertain that as a feature request?
>>
>> Even if not, you've been extremely helpful, thank you! And thanks for
>> pointing me to the correct version of documentation: I thought I was reading
>> current version.
>> Kent
>>
>> P.S. I also have ideas for possible vertical URL partitioning, but
>> unfortunately, our app has much cross-over by URL, so that's why I'm down
>> this maximum-requests path...
>>
>>
>> On Friday, January 16, 2015 at 4:54:50 AM UTC-5, Graham Dumpleton wrote:
>>
>> On 16/01/2015, at 7:28 AM, Kent <[email protected]> wrote:
>>
>> I'm running 4 (a very early version of it, possibly before you officially
>> released it). We upgraded to take advantage of the amazingly-helpful
>> SIGUSR1 signaling for graceful process restarting, which we use somewhat
>> regularly to gracefully deploy software changes (minor ones which won't
>> matter if 2 processes have different versions loaded) without disrupting
>> users. Thanks a ton for that!
>>
>> SIGUSR1 support was released in version 4.1.0.
>>
>> http://modwsgi.readthedocs.org/en/master/release-notes/version-4.1.0.html
>>
>> That same version has all the other stuff which was changed so long as using
>> the actual 4.1.0 is being used and you aren't still using the repo from
>> before the official release.
>>
>> If not sure, best just upgrading to latest version if you can.
>>
>> We are also multi-threading our processes (plural processes, plural threads).
>>
>> Some requests could be (validly) running for very long periods of time
>> (database reporting, maybe even half hour, though that would be very
>> extreme).
>>
>> Some processes (especially those generating .pdfs, for example), hog tons of
>> RAM, as you know, so I'd like these to eventually check their RAM back in,
>> so to speak, by utilizing either inactivity-timeout or maximum-requests, but
>> always in a very gentle way, since, as I mentioned, some requests might be
>> properly running, even though for many minutes. maximum-requests seems too
>> brutal for my use-case since the threshold request sends the process down
>> the graceful-timeout/shutdown-timeout, even if there are valid processes
>> running and then SIGKILLs. My ideal vision of "maximum-requests," since it
>> is primarily for memory management, is to be very gentle, sort of a "ok, now
>> that I've hit my threshold, at my next earliest convenience, I should die,
>> but only once all my current requests have ended of their own accord."
>>
>> That is where if you vertically partition those URLs out to a separate
>> daemon process group, you can simply set a very hight graceful-timeout value.
>>
>> So relying on the feature:
>>
>> """
>> 2. Add a graceful-timeout option to WSGIDaemonProcess. This option is
>> applied in a number of circumstances.
>>
>> When maximum-requests and this option are used together, when maximum
>> requests is reached, rather than immediately shutdown, potentially
>> interupting active requests if they don’t finished with shutdown timeout,
>> can specify a separate graceful shutdown period. If the all requests are
>> completed within this time frame then will shutdown immediately, otherwise
>> normal forced shutdown kicks in. In some respects this is just allowing a
>> separate shutdown timeout on cases where requests could be interrupted and
>> could avoid it if possible.
>> """
>>
>> You could set:
>>
>> maximum-requests=20 graceful-timeout=600
>>
>> So as soon as it hits 20 requests, it switches to a mode where it will when
>> no requests, restart. You can set that timeout as high as you want, even
>> hours, and it will not care.
>>
>> "inactivity-timeout" seems to function exactly as I want in that it seems
>> like it won't ever kill a process with a thread with an active request (at
>> least, I can't get it too even by adding a long import
>> time;time.sleep(longtime)... it doesn't seem to die until the request is
>> finished. But that's why the documentation made me nervous because it
>> implies that it could, in fact, kill a proc with an active request: "For the
>> purposes of this option, being idle means no new requests being received, or
>> no attempts by current requests to read request content or generate response
>> content for the defined period."
>>
>> The release notes for 4.1.0 say:
>>
>> """
>> 4. The inactivity-timeout option to WSGIDaemonProcess now only results in
>> the daemon process being restarted after the idle timeout period where there
>> are no active requests. Previously it would also interrupt a long running
>> request. See the new request-timeout option for a way of interrupting long
>> running, potentially blocked requests and restarting the process.
>> """
>>
>> I'd rather have a more gentle "maximum-requests" than "inactivity-timeout"
>> because then, even on very heavy days (when RAM is most likely to choke), I
>> could gracefully turn over these processes a couple times a day, which I
>> couldn't do with "inactivity-timeout" on an extremely heavy day.
>>
>> Hope this makes sense. I'm really asking :
>> whether inactivity-timeout triggering will ever SIGKILL a process with an
>> active request, as the docs intimate
>> No from 4.1.0 onwards.
>> whether there is any way to get maximum-requests to behave more gently under
>> all circumstances
>> By setting a very very long graceful-timeout.
>> for your ideas/best advice
>> Have a good read through the release notes for 4.1.0.
>>
>> Not necessarily useful in your case, but also look at request-timeout. It
>> can act as a final fail safe for when things are stuck, but since it is more
>> of a fail safe, it doesn't make use of graceful-timeout.
>>
>> Graham
>>
>>
>> Thanks for your help!
>>
>>
>>
>> On Wednesday, January 14, 2015 at 9:48:02 PM UTC-5, Graham Dumpleton wrote:
>>
>> On 15/01/2015, at 8:32 AM, Kent <[email protected]> wrote:
>>
>> > Graham, the docs state: "For the purposes of this option, being idle means
>> > no new requests being received, or no attempts by current requests to read
>> > request content or generate response content for the defined period."
>> >
>> > This implies to me that a running request that is taking a long time could
>> > actually be killed as if it were idle (suppose it were fetching a very
>> > slow database query). Is this the case?
>>
>> This is the case for mod_wsgi prior to version 4.0.
>>
>> Things have changed in mod_wsgi 4.X.
>>
>> How long are your long running requests though? The inactivity-timeout was
>> more about restarting infrequently used applications so that memory can be
>> taken back.
>>
>>
>> > Also, I'm looking for an ultra-conservative and graceful method of
>> > recycling memory. I've read your article on url partitioning, which was
>> > useful, but sooner or later, one must rely on either inactivity-timeout or
>> > maximum-requests, is that accurate? But both these will eventually, after
>> > graceful timeout/shutdown timeout, potentially kill active requests. It
>> > is valid for our app to handle long-running reports, so I was hoping for
>> > an ultra-safe mechanism.
>> > Do you have any advice here?
>>
>> The options available in mod_wsgi 4.X are much better in this area than 3.X.
>> The changes in 4.X aren't covered in main documentation though and are only
>> described in the release notes where change was made.
>>
>> In 4.X the concept of an inactivity-timeout is now separate to the idea of a
>> request-timeout. There is also a graceful-timeout that can be applied to
>> maximum-requests and some other situations as well to allow requests to
>> finish out properly before being more brutal. One can also signal the daemon
>> processes to do a more graceful restart as well.
>>
>> You cannot totally avoid having to be brutal though and kill things else you
>> don't have a fail safe for a stuck process where all request threads were
>> blocked on back end services and were never going to recover. Use of
>> multithreading in a process also complicates the implementation of
>> request-timeout.
>>
>> Anyway, the big question is what version are you using?
>>
>> Graham
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/modwsgi.
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "modwsgi" group.
>> To unsubscribe from this group and stop re
>> ...
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/modwsgi.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to a topic in the Google
> Groups "modwsgi" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/modwsgi/84yzDAMFRsw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/modwsgi.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/modwsgi.
> For more options, visit https://groups.google.com/d/optout.
>
--
You received this message because you are subscribed to the Google Groups
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.