Re: [Paste] Paste's HTTP server thread pool (reliability)

Ian Bicking Fri, 02 Feb 2007 12:08:17 -0800

Shannon -jj Behrens wrote:
> All of this can get really sticky, and I fear there are no good,
> general answers.  If you do decide to start killing long-running
> threads, I do like the idea of letting the programmer explicitly state
> that the thread should be long running.  Do you really have a problem
> of threads hanging?  It's just not something I've had a problem with
> in general.


Generally no, but occasionally yes, and that's enough to concern me. 
Also currently there are no tools or even logs to really help someone 
figure out what might be causing problems.

The specific project we're working on involves fetching other URLs, 
which is something that can block in awkward ways.  We have some ideas 
to avoid that (probably not performing the subrequests in the request 
thread), but even so I would like some additional places where we can 
catch problems.  Generally when something goes wrong I really don't like 
the current behavior, which is that there's no way to notice until the 
whole server stops responding, and no resolution except restarting the 
server.

I don't think there's a firm general answer -- in an effort to protect 
some requests from other requests, you might instead mess up the entire 
machine (e.g., if you let the number of threads simply increase, which I 
think is how the non-pooled httpserver would act currently).  Or, you 
may want to partition requests so that some family of requests is kept 
separate from another family (e.g., we'd like to partition along domain 
names), but that's a fairly complicated heuristics.  And along with that 
bursts of traffic are always fairly likely, and you don't want to 
mistake those for actual problems -- that's just what you should expect 
to happen.

I'd really like a Paste app to be something you can start up and just 
depend on it to keep working indefinitely without lots of tending.  This 
is one of the pieces to make that happen.  Actually, I think all that's 
needed is:

1. Isolated Python environment (workingenv, virtual-python): without 
this an installation can easily be broken by other activity on the machine.

2. A process supervisor (supervisor2, daemontools): just in case it 
segfaults.

3. Exception handling that actively tells you when things are broken. 
E.g., if a database goes down everything will respond, but every page 
will give a server error.

4. Of course, application state should never disappear because of a 
process restart.  In-memory sessions are right out as a result; 
everything has to be serializable.  That won't always work perfectly 
(e.g., when there's a hard restart or a segfault), but doing a proper 
restart should never be a problem.

5. Reasonable handling of these thread problems, if they occur. 
Alternately a forking (or generally multi-process) server that monitors 
its child processes could work.  Sadly we don't have an HTTP server that 
does that.  I'm not sure if flup really monitors its children either, or 
just spawns them and expects them to die.

6. Some monitor that checks URL(s) and handles when the URL is gone or 
misbehaving.  Ideally it could restart the process if the URL is just 
gone or not responding (supervisor2 has an XMLRPC API, for instance). 
Server errors should probably be handled via notification; restarts 
don't (or at least shouldn't) just fix those.

7. In addition to looking for responding URLs, memory leaks (or greedy 
CPU usage over a long time) would be good to detect.  These are a little 
trickier, and need a soft limit (when notification happens) then a hard 
limit (when a restart is automatically done).  Handling ulimit might be 
enough, not sure.


Right now we have 1-4.  Then we just need 5-7, and to plug them all 
together nicely so people can easily deploy the entire combination.  The 
result should be something as reliable as PHP, and also reliable in 
situations when the sysadmin really doesn't want to tend to individual 
applications.

-- 
Ian Bicking | [EMAIL PROTECTED] | http://blog.ianbicking.org

_______________________________________________
Paste-users mailing list
[email protected]
http://webwareforpython.org/cgi-bin/mailman/listinfo/paste-users

Re: [Paste] Paste's HTTP server thread pool (reliability)

Reply via email to