no. you explain the situtation well.
i realize that this is exactly the situation. and the problem is that once you
run out of memory, all kinds of stuff will start failing. fork will fail.
unless
there is a single identifiably offender that the kernel kills and isn't
automatically
restarted, your chances of fixing the system are small.
running a production system on non-dedicated h/w is a bug in itself. you
need to have some idea of what's going to be running on your box.
i've run into all those cases. (except for legitamately running out of memory
with a completely sane system.)
running out of fds on unix is a bug. you can query the system for the number
of fds you are allowed to use, and you need to respect this number.
running out of processes could be lots of things. i always boiled it down to
either a configuration error or a lack of resource control. e.g. something like
if(nrq > cfg.rqmax){
werrstr("too many requests");
return -1;
}
nrq++;
look, i like to start out with each "difficult" error (like running out of
memory)
and handling the real cases that kill my program. generally this is a small
fraction of the total number of ways the application could fail.
naturally this is more work, but if you do it you have confidence that the
situation
you specifically saw and handled will be handled correctly.
this is a bit of religion. hopefully i'm not too dogmatic. ;-)
- erik
On Fri Jun 9 19:25:35 CDT 2006, [EMAIL PROTECTED] wrote:
> > i'm skeptical that this is a real-world problem. i've not run out of memory
> > without hosing the system to the point where it needed to be rebooted.
>
> the problem we face is that we can't isolate our programs on dedicated
> hardware the way you isolate venti for example. if you ran a
> standalone venti server and ran out of memory you could argue that the
> crap has hit the fan irrevocably.
>
> some of our code looks a lot like a meta-kernel: we provide the
> capabilities for running other programs on many machines concurrently.
> in more cases than anyone will admit, those programs misbehave badly
> but we can't afford to throw the towel every time.
>
> to illustrate from experience, just in the space of one month this
> year, we ran out of memory, out of processes to run, out of time and
> out of file descriptors in trivial cases. we simply must keep going or
> at least sit quietly and wait for the storm to pass...
>
> i'm sorry if i'm not explaining the situation too well.
>