Ian Smith wrote:
> Smells like flakey hardware .. intermittent, inexplicable glitches. It
> might survive hours on one workload, minutes on another, no sense to it?
> > All that I am seeing is that there is either a problem with the bios
> > (which I even reinstalled and that changed nothing in the functioning)
> > or something is going on with the OS.
> After you've thoroughly proven the hardware is AOK under sustained and
> varied pressure, then you can suspect software issues - which tend to be
> far more consistent and repeatable - but if the hardware's acting flakey
> then you likely won't see any consistency in software issues, which does
> seem to concur with your descriptions to date.
In my experience, hardware problems can quite possibly show little pattern
to where and when in the usage of said machine they cause the box to flake.
One that is malfunctioning all the time is relatively easy to find.
The intermittent is the bane of all troubleshooting. I hate the intermittent
more than I hate anything. One pattern an intermittent will show is
eventually as the bad part gets worse the period between flakes will get
shorter, and ultimately at some point die completely. Initially the period
can be quite large so proper troubleshooting is difficult as you can't
troubleshoot during the 'in between' when it's not malfunctioning.
I also have an 80/20 rule about hardware as to whether it is a hot or cold
failure. The 80% part is that most hardware problems occur when very dense
VLSI chips heat up. So a machine may not show any problem until it's been
powered up for a while. The other 20% is the cold start. Turn the box on and
there is immediately some kind of problem early on in the course of booting.
Leave it powered on, walk away for 20 minutes to get a coffee, and reset it
after it's had a chance to warm up and now it works fine the rest of the
day. These patterns are indicative of a typical pattern in hardware trouble
A software error, on the other hand, most of the time shows itself as a well
defined repeatable sequence of steps that cause the problem every time the
sequence is executed. This can also usually be easily reproduced by others
running the same, or similar enough, platform(s) by executing said sequence.
This can get quite sticky as even the BIOS code is software! Bad buggy BIOS
code having a bad reaction to the compiled boot loader binary, even though
probably quite rare, is not totally outside the realm of possibility.
Somewhere very near the root of the geometric logic tree of troubleshooting
you need to be able to drive a wedge between hardware and software in a
divide and conquer kind of way. Making any arbitrary assumptions as to which
side is the problem early on will blind the troubleshooter to avenues of
hypothesis this and test that. Assume that the hardware is 100% OK so it
must be a software problem without proof is a mistake, and vice versa.
And it might be as simple as installing another OS such as a Linux distro or
Windows to the box. If it is truly a hardware problem it may continue to
malfunction and cause trouble no matter what the choice of OS. Or it may
not, as sometimes buggy hardware design failures are compensated for with
workarounds in drivers, thus hiding the flaw. It's the old 'have a <insert
brand name> box with xyz hardware' with a known problem and the fix is to
download and install <insert brand name> driver revision such and such from
Since these kinds of things are not generally propagated far and wide an OS
such as FreeBSD may not be privy to such bad hardware details. Sometimes the
developers do incorporate hacks for hardware. If you can accurately identify
such a situation the most likely way to get it fixed for the long run is to
file a proper PR. If done well enough and it catches the eye of a dev who
may be interested and actually possess the piece of hardware a workaround
may get coded and become a part of FreeBSD.
Just a lot of generalizations here. As always, there is the YMMV clause. :-)
email@example.com mailing list
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"