> You'd need to look at fraction of total that is data vs. code,
> then at fraction of total code that is going to cause hurt if
> flipped.  This stuff can have numbers attached.
> 
> Here's an example from my world. 1 MB of code, 32 MB of kernel,
> and 2GB minus that of data.  This is a lower end ratio as the
> nodes don't have much memory.
> 
> If the data is flipped, you're not going to know of errors unless
> you are looking for numerical instability.

Also subtract out all of the kernel code which is boot-only:  it
needs to be uncorrupted for just the twinkling of an eye.  Almost
all of every format string (used or not) can be corrupted without
anything dramatic happening.  While you're in the kernel, the
exception-handling label stack could be totally trashed as long
as nobody invokes error() during this system call.  Or maybe a bit
flip rewrites an instruction to use %ebx instead of %eax, but
at a point when they both contain the same value.

There's lots of stuff which doesn't have to be totally right to
"work", and even the stuff that must be 100% right may be fine
if it's wrong at the the right time.

"Back in the old days", a lot of VAX-11/750's running BSD Unix
crashed because of parity errors in their TLB's.  750's running
VMS "didn't have this problem", because VMS would silently work
around it; BSD grew that code--see, for example, <[email protected]>.
Then bits could flip all the time with nobody noticing!

Dave Eckhardt

Reply via email to