On 2/2/07, Ian Bicking <[EMAIL PROTECTED]> wrote: > Shannon -jj Behrens wrote: > > All of this can get really sticky, and I fear there are no good, > > general answers. If you do decide to start killing long-running > > threads, I do like the idea of letting the programmer explicitly state > > that the thread should be long running. Do you really have a problem > > of threads hanging? It's just not something I've had a problem with > > in general. > > Generally no, but occasionally yes, and that's enough to concern me. > Also currently there are no tools or even logs to really help someone > figure out what might be causing problems. > > The specific project we're working on involves fetching other URLs, > which is something that can block in awkward ways. We have some ideas > to avoid that (probably not performing the subrequests in the request > thread), but even so I would like some additional places where we can > catch problems. Generally when something goes wrong I really don't like > the current behavior, which is that there's no way to notice until the > whole server stops responding, and no resolution except restarting the > server. > > I don't think there's a firm general answer -- in an effort to protect > some requests from other requests, you might instead mess up the entire > machine (e.g., if you let the number of threads simply increase, which I > think is how the non-pooled httpserver would act currently). Or, you > may want to partition requests so that some family of requests is kept > separate from another family (e.g., we'd like to partition along domain > names), but that's a fairly complicated heuristics. And along with that > bursts of traffic are always fairly likely, and you don't want to > mistake those for actual problems -- that's just what you should expect > to happen. > > I'd really like a Paste app to be something you can start up and just > depend on it to keep working indefinitely without lots of tending. This > is one of the pieces to make that happen. Actually, I think all that's > needed is: > > 1. Isolated Python environment (workingenv, virtual-python): without > this an installation can easily be broken by other activity on the machine. > > 2. A process supervisor (supervisor2, daemontools): just in case it > segfaults. > > 3. Exception handling that actively tells you when things are broken. > E.g., if a database goes down everything will respond, but every page > will give a server error. > > 4. Of course, application state should never disappear because of a > process restart. In-memory sessions are right out as a result; > everything has to be serializable. That won't always work perfectly > (e.g., when there's a hard restart or a segfault), but doing a proper > restart should never be a problem. > > 5. Reasonable handling of these thread problems, if they occur. > Alternately a forking (or generally multi-process) server that monitors > its child processes could work. Sadly we don't have an HTTP server that > does that. I'm not sure if flup really monitors its children either, or > just spawns them and expects them to die. > > 6. Some monitor that checks URL(s) and handles when the URL is gone or > misbehaving. Ideally it could restart the process if the URL is just > gone or not responding (supervisor2 has an XMLRPC API, for instance). > Server errors should probably be handled via notification; restarts > don't (or at least shouldn't) just fix those. > > 7. In addition to looking for responding URLs, memory leaks (or greedy > CPU usage over a long time) would be good to detect. These are a little > trickier, and need a soft limit (when notification happens) then a hard > limit (when a restart is automatically done). Handling ulimit might be > enough, not sure. > > > Right now we have 1-4. Then we just need 5-7, and to plug them all > together nicely so people can easily deploy the entire combination. The > result should be something as reliable as PHP, and also reliable in > situations when the sysadmin really doesn't want to tend to individual > applications.
I don't have anything really useful to say. By the way, we're using Nagios to provide *some* assurances that things haven't gone awry. Best Regards, -jj -- http://jjinux.blogspot.com/ _______________________________________________ Paste-users mailing list [email protected] http://webwareforpython.org/cgi-bin/mailman/listinfo/paste-users
