Re: [Paste] Paste's HTTP server thread pool (reliability)

Ian Bicking Fri, 02 Feb 2007 14:40:01 -0800

James Taylor wrote:
> Hi Ian,
> 
> I'm not on the Paste list so I didn't see this thread until it got 
> crossposted to pylons. However, I did contribute the thread-pool to 
> Paste#httpserver so I suppose I should weigh in.
> 
>> When a request comes in and there are no free threads to handle it, a
>> new thread should be created up to max_threads (configurable). Maybe
>> the thread should only live for one request, or maybe it should be added
>> to the pool and the pool periodically reduced in size if possible.
> 
> Absolutely, having the threadpool grow when load spikes is a good idea. 
> I favor adding the thread to the pool, and then backing off and killing 
> some threads once the load goes down.


Creating a thread isn't terribly expensive -- not terribly cheap either, 
but I'm not sure if the management overhead (or, uh, coding overhead) is 
entirely worth it.  It's only really useful for a particular kind of 
high load -- generally one where there's lots of kind of slow requests 
that are blocking on something other than CPU.  If you are simply 
getting lots of requests, it's better to just let them queue.

So maybe the threadpool should only expand when requests seem to be slow 
(e.g., more than 1 second, or maybe even 5 seconds).  If no requests 
among the current workers are slower than that (or, maybe no more than 
one or two) then just let the request queue and the next free thread 
will handle it.

Of course if you are getting 90% bad-performing requests, and 10% 
good-performing requests, the worker pool will get totally wedged up by 
those bad-performing requests.  And even if you expand the pool a little 
it'll still be wedged up fairly fast.

> Of course, monitoring the activity and getting the policy right is 
> pretty annoying, which is why I didn't do it in the first place ;)
> 
>> When a request comes in and there are already a maximum number of
>> threads created, the thread most likely to be wedged (the one that's
>> been working the longest) should be killed and another one added. If
>> none of the threads has been working very long (wedged_thread_threshold)
>> then we assume we just have a lot of requests coming in, and we simply
>> queue the request. That means if like 10 threads all get wedged at
>> once, and another request comes in, it could end up queued until yet
>> another request comes in. And then that other request will kill a
>> thread, the old request gets off the queue, and the new request is back
>> on the queue. I'm not sure how to deal with that problem, except maybe
>> to try to empty the queue with multiple kills once a wedged situation is
>> detected.
> 
> Killing active threads makes me very uncomfortable. We have had these 
> sorts of lockups, but they have always turned out to be bugs in our 
> application which I am glad Paste did not hide from us.

Of course there has to be logging for all of this, and maybe some kind 
of notification as well (easy to miss stuff that just sits in a log file).

> We have some requests that are long running (particularly proxies to 
> other sites that are sometimes very slow) so killing the thread that has 
> been locked the longest or something would work poorly. Now, if you have 
> some other idea of how to detect deadlocks maybe that would work well, 
> but I think it is a *hard problem*.

Certainly.  We have the same situation and the opposite desire -- if we 
proxy to another site that is really slow (or totally dead, or a DNS 
server isn't responding in a timely manner, etc) we'd like to kill that 
request rather than let it bog down all the other requests.

If there are threads available then I wouldn't want to kill anything, or 
at least not anything that's not really really old (e.g., a couple 
hours).  And again, there should be a way of marking things as being a 
long request -- it's frustrated me when Apache has killed requests that 
last longer than 15 minutes or whatever, when I expected the request to 
last that long.  Those are fairly unusual, of course (a browser won't 
wait that long anyway), but shouldn't be left out.  I'm not sure if it's 
a good idea to allow a request header to hint something about the 
request length...?  It seems like that could potentially be abused to do 
a DoS, but I dunno.

> For our QOS purposes we're thinking about having multiple thread pools 
> in the server and the ability to dispatch requests to different pools. 
> Thus our long running requests or file uploads could have their own 
> dedicated thread pools, while the main pool would still be ready to 
> serve the fast requests.

We considered this, in our case maybe based on host name.  But it's 
pretty tricky.  I can imagine a pluggable pool selector.  Well, actually 
that's hard -- the request has already been delegated to a specific 
thread before we know almost anything about the request, including 
headers and path.  You'd need an async (non-threaded) server to do the 
parsing before delegating to a thread pool -- something like Medusa or 
Twisted.  Arguably Medusa would be a better basis for a threaded HTTP 
server than the current setup.  It's a stable, mature, and smallish HTTP 
server (Twisted is awfully bulky).

You could at least whitelist some client IP address (e.g., 127.0.0.1, 
and all the developers' machines).  That information is available early, 
and you can always open up just *one more* thread to respond to that 
request.  Then at least developers could get in to see what's happening 
(e.g., with the egg:Paste#watch_threads app).  But ideally logs would 
have the same information, and poking around a lot might not be all that 
useful (as opposed to just restarting the whole server).

> I have a student working on this at the moment, and if it works well I 
> hope to help him turn it into a nice patch.

He should be sure to update to see the changes I've made in Paste 1.2 -- 
it adds enough information to at least track the requests that are 
misbehaving (though it doesn't actually track them with any logs currently).

-- 
Ian Bicking | [EMAIL PROTECTED] | http://blog.ianbicking.org

_______________________________________________
Paste-users mailing list
[email protected]
http://webwareforpython.org/cgi-bin/mailman/listinfo/paste-users

Re: [Paste] Paste's HTTP server thread pool (reliability)

Reply via email to