Re: [Paste] Paste's HTTP server thread pool (reliability)

2007-02-02 Thread Ian Bicking

Shannon -jj Behrens wrote:
 On 2/2/07, Ian Bicking [EMAIL PROTECTED] wrote:
 Shannon -jj Behrens wrote:
  All of this can get really sticky, and I fear there are no good,
  general answers.  If you do decide to start killing long-running
  threads, I do like the idea of letting the programmer explicitly state
  that the thread should be long running.  Do you really have a problem
  of threads hanging?  It's just not something I've had a problem with
  in general.

 Generally no, but occasionally yes, and that's enough to concern me.
 Also currently there are no tools or even logs to really help someone
 figure out what might be causing problems.

 The specific project we're working on involves fetching other URLs,
 which is something that can block in awkward ways.  We have some ideas
 to avoid that (probably not performing the subrequests in the request
 thread), but even so I would like some additional places where we can
 catch problems.  Generally when something goes wrong I really don't like
 the current behavior, which is that there's no way to notice until the
 whole server stops responding, and no resolution except restarting the
 server.

 I don't think there's a firm general answer -- in an effort to protect
 some requests from other requests, you might instead mess up the entire
 machine (e.g., if you let the number of threads simply increase, which I
 think is how the non-pooled httpserver would act currently).  Or, you
 may want to partition requests so that some family of requests is kept
 separate from another family (e.g., we'd like to partition along domain
 names), but that's a fairly complicated heuristics.  And along with that
 bursts of traffic are always fairly likely, and you don't want to
 mistake those for actual problems -- that's just what you should expect
 to happen.

 I'd really like a Paste app to be something you can start up and just
 depend on it to keep working indefinitely without lots of tending.  This
 is one of the pieces to make that happen.  Actually, I think all that's
 needed is:

 1. Isolated Python environment (workingenv, virtual-python): without
 this an installation can easily be broken by other activity on the 
 machine.

 2. A process supervisor (supervisor2, daemontools): just in case it
 segfaults.

 3. Exception handling that actively tells you when things are broken.
 E.g., if a database goes down everything will respond, but every page
 will give a server error.

 4. Of course, application state should never disappear because of a
 process restart.  In-memory sessions are right out as a result;
 everything has to be serializable.  That won't always work perfectly
 (e.g., when there's a hard restart or a segfault), but doing a proper
 restart should never be a problem.

 5. Reasonable handling of these thread problems, if they occur.
 Alternately a forking (or generally multi-process) server that monitors
 its child processes could work.  Sadly we don't have an HTTP server that
 does that.  I'm not sure if flup really monitors its children either, or
 just spawns them and expects them to die.

 6. Some monitor that checks URL(s) and handles when the URL is gone or
 misbehaving.  Ideally it could restart the process if the URL is just
 gone or not responding (supervisor2 has an XMLRPC API, for instance).
 Server errors should probably be handled via notification; restarts
 don't (or at least shouldn't) just fix those.

 7. In addition to looking for responding URLs, memory leaks (or greedy
 CPU usage over a long time) would be good to detect.  These are a little
 trickier, and need a soft limit (when notification happens) then a hard
 limit (when a restart is automatically done).  Handling ulimit might be
 enough, not sure.


 Right now we have 1-4.  Then we just need 5-7, and to plug them all
 together nicely so people can easily deploy the entire combination.  The
 result should be something as reliable as PHP, and also reliable in
 situations when the sysadmin really doesn't want to tend to individual
 applications.
 
 I don't have anything really useful to say.  By the way, we're using
 Nagios to provide *some* assurances that things haven't gone awry.

I've tried Nagios a little bit in the past, but found it rather hard to 
set up for the small task I had in mind (just checking some URLs).  And 
it couldn't do something like restart a service (AFAIK).  Still, this is 
certainly something that any serious developer should have (be it Nagios 
or mon or Big Brother or whatever).

It would be a nice addition to Pylons to add a convention for a pingable 
URL in an application.  The URL should do little work, but users could 
add on to it -- typically if you have a database, you might check you 
can connect to the database, for instance.  Or check that critical 
directories existed and were writable, etc.

I had intended PyPeriodic to be the basis for a URL checker, since the 
periodic part felt harder to me than the actual URL fetching, and it 
would 

Re: [Paste] Paste's HTTP server thread pool (reliability)

2007-02-02 Thread Cliff Wells

On Fri, 2007-02-02 at 16:03 -0600, Ian Bicking wrote:
 Shannon -jj Behrens wrote:

 
  The specific project we're working on involves fetching other URLs,
  which is something that can block in awkward ways.  We have some ideas
  to avoid that (probably not performing the subrequests in the request
  thread), but even so I would like some additional places where we can
  catch problems.  Generally when something goes wrong I really don't like
  the current behavior, which is that there's no way to notice until the
  whole server stops responding, and no resolution except restarting the
  server.
 


This doesn't really address the larger issue, but I've found that using
Twisted for such things works great.  I've never really been able to
manage writing an entire application in Twisted but writing simple apps
that do things like fetch a bunch of URLS and republish the results of
that as a local service is remarkably simple and robust.  Also, unlike
much of Twisted, there tends to be lots of examples for doing such
things lying about the web.  

A few months ago I wrote an RSS aggregator in Twisted that would fetch
remote feeds and republish them as a local service that a TurboGears app
could then quickly and reliably fetch from.  This nicely sidesteps
blocking threads in your Paste/Pylons/CherryPy server.

Regards,
Cliff


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---