[Paste] Paste's HTTP server thread pool (reliability)

Ian Bicking Tue, 30 Jan 2007 12:29:16 -0800

Hi quiet Paste list.

If you are anything like me, you hate it when your HTTP server freezes 
up because all the threads are wedged for some reason.  If you aren't 
using pooling I guess this won't happen, but instead you'll eventually 
have tons of wedged threads sitting around and that's not great either.


Anyway, in the trunk I made some additions to the thread pool and a 
small app (egg:Paste#watch_threads) that lets you monitor the pool 
through the web and even kill threads.  (How reliable the thread killing 
is, I'm not sure -- it worked for a couple cases I tried, like reading 
past the end of a socket and an infinite loop in Python.)

Of course it is flawed, since if your thread pool is exhausted you can't 
access the app.  Plus, while seeing wedged threads is nice for 
debugging, it's not really something best managed manually.

So I'm thinking about how the thread pool could be improved.  Here's my 
idea; I'm interested in opinions:

When a request comes in and there are no free threads to handle it, a 
new thread should be created up to max_threads (configurable).  Maybe 
the thread should only live for one request, or maybe it should be added 
to the pool and the pool periodically reduced in size if possible.

When a request comes in and there are already a maximum number of 
threads created, the thread most likely to be wedged (the one that's 
been working the longest) should be killed and another one added.  If 
none of the threads has been working very long (wedged_thread_threshold) 
then we assume we just have a lot of requests coming in, and we simply 
queue the request.  That means if like 10 threads all get wedged at 
once, and another request comes in, it could end up queued until yet 
another request comes in.  And then that other request will kill a 
thread, the old request gets off the queue, and the new request is back 
on the queue.  I'm not sure how to deal with that problem, except maybe 
to try to empty the queue with multiple kills once a wedged situation is 
detected.

Maybe we should add an API to the request environment to tell the server 
that a long-running request is expected.  This way a conscientious 
programmer could still do long-running requests without being afraid of 
being killed, but you have to express real intention to do so.

We can check if threads that we killed are actually dead (they'll still 
be listed in threading._active).  If we see an excess of these we can 
kill the whole process (assuming that a supervisor process is going to 
restart the server).  Configurable, zombie_thread_threshold or 
something, obviously not on by default.

Anyway, any thoughts anyone has would be appreciated.  Clearly I'll have 
to write up a document explaining all this, as it's going to be too long 
to go in a docstring.

I guess I'll also have to clean up some of the lingering issues in 
Paste's HTTP server too (I think just the wsgi.input blocking problem 
and the limited request methods), as once I start relying on this stuff 
it'll be harder to move to another server.  So I can no longer vacillate 
  on what server people should use -- ours!

-- 
Ian Bicking | [EMAIL PROTECTED] | http://blog.ianbicking.org

_______________________________________________
Paste-users mailing list
[email protected]
http://webwareforpython.org/cgi-bin/mailman/listinfo/paste-users

[Paste] Paste's HTTP server thread pool (reliability)

Reply via email to