On 2/2/07, Ian Bicking <[EMAIL PROTECTED]> wrote:
> Shannon -jj Behrens wrote:
> > All of this can get really sticky, and I fear there are no good,
> > general answers.  If you do decide to start killing long-running
> > threads, I do like the idea of letting the programmer explicitly state
> > that the thread should be long running.  Do you really have a problem
> > of threads hanging?  It's just not something I've had a problem with
> > in general.
>
> Generally no, but occasionally yes, and that's enough to concern me.
> Also currently there are no tools or even logs to really help someone
> figure out what might be causing problems.
>
> The specific project we're working on involves fetching other URLs,
> which is something that can block in awkward ways.  We have some ideas
> to avoid that (probably not performing the subrequests in the request
> thread), but even so I would like some additional places where we can
> catch problems.  Generally when something goes wrong I really don't like
> the current behavior, which is that there's no way to notice until the
> whole server stops responding, and no resolution except restarting the
> server.
>
> I don't think there's a firm general answer -- in an effort to protect
> some requests from other requests, you might instead mess up the entire
> machine (e.g., if you let the number of threads simply increase, which I
> think is how the non-pooled httpserver would act currently).  Or, you
> may want to partition requests so that some family of requests is kept
> separate from another family (e.g., we'd like to partition along domain
> names), but that's a fairly complicated heuristics.  And along with that
> bursts of traffic are always fairly likely, and you don't want to
> mistake those for actual problems -- that's just what you should expect
> to happen.
>
> I'd really like a Paste app to be something you can start up and just
> depend on it to keep working indefinitely without lots of tending.  This
> is one of the pieces to make that happen.  Actually, I think all that's
> needed is:
>
> 1. Isolated Python environment (workingenv, virtual-python): without
> this an installation can easily be broken by other activity on the machine.
>
> 2. A process supervisor (supervisor2, daemontools): just in case it
> segfaults.
>
> 3. Exception handling that actively tells you when things are broken.
> E.g., if a database goes down everything will respond, but every page
> will give a server error.
>
> 4. Of course, application state should never disappear because of a
> process restart.  In-memory sessions are right out as a result;
> everything has to be serializable.  That won't always work perfectly
> (e.g., when there's a hard restart or a segfault), but doing a proper
> restart should never be a problem.
>
> 5. Reasonable handling of these thread problems, if they occur.
> Alternately a forking (or generally multi-process) server that monitors
> its child processes could work.  Sadly we don't have an HTTP server that
> does that.  I'm not sure if flup really monitors its children either, or
> just spawns them and expects them to die.
>
> 6. Some monitor that checks URL(s) and handles when the URL is gone or
> misbehaving.  Ideally it could restart the process if the URL is just
> gone or not responding (supervisor2 has an XMLRPC API, for instance).
> Server errors should probably be handled via notification; restarts
> don't (or at least shouldn't) just fix those.
>
> 7. In addition to looking for responding URLs, memory leaks (or greedy
> CPU usage over a long time) would be good to detect.  These are a little
> trickier, and need a soft limit (when notification happens) then a hard
> limit (when a restart is automatically done).  Handling ulimit might be
> enough, not sure.
>
>
> Right now we have 1-4.  Then we just need 5-7, and to plug them all
> together nicely so people can easily deploy the entire combination.  The
> result should be something as reliable as PHP, and also reliable in
> situations when the sysadmin really doesn't want to tend to individual
> applications.

I don't have anything really useful to say.  By the way, we're using
Nagios to provide *some* assurances that things haven't gone awry.

Best Regards,
-jj

-- 
http://jjinux.blogspot.com/

_______________________________________________
Paste-users mailing list
[email protected]
http://webwareforpython.org/cgi-bin/mailman/listinfo/paste-users

Reply via email to