Hendrik Boom <hend...@topoi.pooq.com> writes: > On Wed, May 04, 2016 at 09:45:24PM +0100, Rainer Weikusat wrote: >> Stephanie Daugherty <sdaughe...@gmail.com> writes: >> > Process supervision is something I'm very opinionated about. In a number of >> > high availability production environments, its a necessary evil. >> > >> > However, it should *never* be an out of the box default for any >> > network-exposed service, Service failures should be extraordinary events, >> > and we should strive to keep treating them as such, >> >> That's based on a particular assumption about how 'automatic restarts' >> will be used, namely, instead of fixing server errors and not as >> complement to that: I treat 'server failures' as 'extraordinary events' >> but users don't (and shouldn't): They should experience as litte down >> time as technically possible. >> >> [...] >> >> > The second reason is that it will reduce the number of high-quality bug >> > reports developers receive - if failure is part of the routine, it tends >> > not to get investigate very thoroughly, if at all. >> >> It greatly reduces the number of "low-quality" (or rather, "no quality") >> bug reports I receive as I don't (usually) get frantic phone calls at >> 3am UK time because a server in Texas terminated itself for some >> reason. Instead, I can collect the core file as soon as I get around to >> that and fix the bug. >> >> NB: I deal with appliances (as developer) and not with servers (as >> sysadmin). > > An excellent example of why respawning needs to be an option, and the > OS should neither force it on or off.
It's technically an option for 'our' system because the service supervisor/ monitor is just a command which is (or isn't) used as part of a complete 'server invocation' (usually from a sysv-style init.d script) and not a Master Control Program and that's what it should IMHO be. But I'm surely using it for all 'new' servers. There are other desirable effects of that, eg, the system becomes (to a degree) self-healing: Say some server can't currently work because of a file system permission issue (or other transient problem, eg, disk full): It's sufficient to remedy the specific problem in order to restore everything to working order as the affected servers will just start to work the next time they're restarted after the situation improved. There's no need to go hunting for "stuff that doesn't run despite it should" and restart it manually (and consequently, no risk to overlook something). But leaving these two general remarks aside, I don't quite understand what you wanted to express. ? _______________________________________________ Dng mailing list Dng@lists.dyne.org https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng