Re: Program logic bugs vs input/environmental errors

Sean Kelly via Digitalmars-d Sun, 28 Sep 2014 21:06:07 -0700

On Monday, 29 September 2014 at 02:57:03 UTC, Walter Bright wrote:

I've said that processes are different, because the scope ofthe effects is limited by the hardware.
If a system with threads that share memory cannot be restarted,there are serious problems with the design of it, because acrash and the necessary restart are going to happen sooner orlater, probably sooner.

Right. But if the condition that caused the restart persists,the process can end up in a cascading restart scenario. Simplyrestarting on error isn't necessarily enough.

I don't believe that the way to get 6 sigma reliability is byignoring errors and hoping. Airplane software is most certainlynot done that way.

I believe I was arguing the opposite. More to the point, I thinkit's necessary to expect undefined behavior to occur and to planfor it. I think we're on the same page here and justmiscommunicating.

I recall Toyota got into trouble with their computer controlledcars because of their idea of handling of inevitable bugs anderrors. It was one process that controlled everything. Whensomething unexpected went wrong, it kept right on operating,any unknown and unintended consequences be damned.
The way to get reliable systems is to design to accommodateerrors, not pretend they didn't happen, or hope that nothingelse got affected, etc. In critical software systems, thatmeans shut down and restart the offending system, or engage thebackup.

My point was that it's often more complicated than that. Therehave been papers written on self-repairing systems, for example,and ways to design systems that are inherently durable when itcomes to even internal errors. I think what I'm trying to say isthat simply aborting on error is too brittle in some cases,because it only deals with one vector--memory corruption that isunlikely to reoccur. But I've watched always-on systems fallapart from some unexpected ongoing situation, and simplyrestarting doesn't actually help.

Re: Program logic bugs vs input/environmental errors

Reply via email to