excellent points; i believe this.  there's no sense in masking errors
with pseudo recovery.  good test coverage should expose programmer
misunderstanding.

if the system can't afford memory allocation errors, then
preallocating (static or dynamic) and capping a maximum that the
system should ever need will help simulate exhaustion in testing and
make the memory usage and response times bounded.  watchdog processes
and memory checksums are possible additional measures.

> i think this has been mentioned on the list before (otherwise i wouldn't
> have known to look for it) but when considering error recovery tactics, it's
> worth looking at http://www.sics.se/~joe/thesis/armstrong_thesis_2003.pdf
> ("Making reliable software systems in the presence of errors")
> 
> he summarises their approach to error recovery as follows:
> 
> - if you can't do what you want to do, die.
> - let it crash.
> - do not program defensively.
> 
> they built a telecoms switching system with a reported measured reliability 
> of 99.9999999%
> following this philosophy.
> 
> see section 4.3 (page 101) for details.
> 
> the key is that another process gets notified of the error.
> 
> he makes this useful distinction between "error" and "exception":
> 
> - exceptions occur when the run-time system does not know what to do.
> - errors occur when the programmer does not know what to do.
> 
> i would suggest that most out-of-memory conditions are best
> classed as errors, not exceptions.

Reply via email to