On Mon, Jan 09, 2012 at 09:55:58AM -0800, Freddie Cash wrote:
> On Mon, Jan 9, 2012 at 9:50 AM, John Nielsen <[email protected]> wrote:
> > From what you've said I strongly suspect that you have some kind of
> > hardware issue. Dodgy RAM is my first guess, something cooling-related is
> > my 2nd, and PSU is my 3rd. It is a little suspicious that you only started
> > having problems after your upgrade but it could be coincidence or it could
> > be something about the new software tickling the hardware differently than
> > the old.
>
> That's what we're leaning toward as well. We're planning on doing a
> BIOS upgrade (betadrive is running v2.00 and alphadrive is v1.00),
> then a memtest86+ run, then check firmware on the SATA controllers.
For hardware/system troubleshooting advice:
1) BIOS upgrade -- since this is also what's responsible for ACPI bits
and other "configuration model" pieces of a system,
2) BIOS settings -- make sure they're all 100% identical between both
systems,
3) Controller firmware -- please make sure these are the same (your
controllers between boxes appear to be the same model),
4) Flaky PSU -- possibly voltages drop or raise below/above levels which
the mainboard can handle. As someone who buys Supermicro exclusively
for their systems, I can tell you that their PSUs ("Ablecom") are
quite cheap/horrible. It's worth purchasing a replacement -- if it
doesn't turn out to be the problem, you now have a spare PSU (which
is good to have -- our last systems failure was due to a blown PSU).
5) Flaky RAM -- memtest86+ can help here, mostly but not entirely.
6) Flaky mainboard -- it happens. Really. :-)
For OS advice:
Compare rc.conf, loader.conf, and so on. For example, is one system
using powerd(8) while the other isn't?
> If none of the above helps, we're thinking of swapping the CPUs
> between the two systems to see if the problems stay with the box or
> follow the CPU.
I was helping out someone on a public forum earlier this week who
purchased a Dell desktop system that started behaving oddly. memtest86+
claimed all his DIMMs were bad (regardless of slot), and replacement
DIMMs claimed the same thing. Dell kept insisting he reload the OS,
else they can try a motherboard swap, blah blah blah. What amused me
was that nobody looked at the CPU: Intel Core i3-550, which contains an
on-die MCH. Chances are the MCH is going bad, which means time to
replace the CPU.
CPUs rarely go bad, but now with on-die MCHs, on-die VGA, etc. it's
becoming much more plausible that the physical CPU needs to be replaced.
They've become practically computers inside of a computer. :-)
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"