Bruce McCulley wrote:

>> FWIW, I've seen different o/s architectures respond differently to
>> h/w faults, so not crashing under Linux would not prove the h/w is
>> clean. Incidentally, that raises an interesting question for discussion,
>> is it better for the o/s to be fault-tolerant and run through
>> problems or not?

That is an age-old question, that dates back to before I started my
career over 25 years ago. I've admittedly become spoiled by the error
handling and error logging capabilities of (O)VMS, ULTRIX, and Tru64
UNIX (and its predecessors). I've worked in both the customer support
arena in a role where I was the back-up to the Customer Support Centers
who had to talk to Engineering, and I'm now in Software QC where I see
the problems so that the customers hopefully never have to. We had a
situation many years ago when Correctable ECC errors were logged by
VMS; idea was to keep track of a memory board that might be going bad
but was still usable, rather than just logging hard errors and shutting
the system down. The upshot was that customers didn't understand this,
and would insist on having all of the memory swapped out on the very
first burp. There were strong technical arguments on each side as to
what the correct process should have been. (BTW, these systems were NOT
the explicitly Fault Tolerant FTVax systems we later sold, nor are they
the FT Himalaya systems designed by Tandem, but "regular VAXes"). In
many cases, the application could be terminated (anybody remember the
old RSTS/E output "Program Lost - Sorry" messages?), and in others,
the operating system would be automatically brought down quasi-gracefully
and rebooted/re-started, with lots of information logged. Tru64 UNIX
now permits core dump naming - instead of an application dumping to
a file called 'core', it will dump to 'core.application.name.0' in a
single-system environment and 'core.nodename.application_name.0' in a
cluster, with the final digit incremented each time that particular
app died on that particular system.

So, there are a lot of potential approaches to the problem, and part
of the value-added by system vendors such as Compaq, Sun, HP, IBM et al,
is to engineer products that serve the customers' needs. Blue screens
don't, IMNSHO.

Bayard

**********************************************************
To unsubscribe from this list, send mail to
[EMAIL PROTECTED] with the following text in the
*body* (*not* the subject line) of the letter:
unsubscribe gnhlug
**********************************************************

Reply via email to