On Mon, Jan 09, 2012 at 09:55:58AM -0800, Freddie Cash wrote:
> On Mon, Jan 9, 2012 at 9:50 AM, John Nielsen <li...@jnielsen.net> wrote:
> > From what you've said I strongly suspect that you have some kind of 
> > hardware issue. Dodgy RAM is my first guess, something cooling-related is 
> > my 2nd, and PSU is my 3rd. It is a little suspicious that you only started 
> > having problems after your upgrade but it could be coincidence or it could 
> > be something about the new software tickling the hardware differently than 
> > the old.
> 
> That's what we're leaning toward as well.  We're planning on doing a
> BIOS upgrade (betadrive is running v2.00 and alphadrive is v1.00),
> then a memtest86+ run, then check firmware on the SATA controllers.

For hardware/system troubleshooting advice:

1) BIOS upgrade -- since this is also what's responsible for ACPI bits
   and other "configuration model" pieces of a system,
2) BIOS settings -- make sure they're all 100% identical between both
   systems,
3) Controller firmware -- please make sure these are the same (your
   controllers between boxes appear to be the same model),
4) Flaky PSU -- possibly voltages drop or raise below/above levels which
   the mainboard can handle.  As someone who buys Supermicro exclusively
   for their systems, I can tell you that their PSUs ("Ablecom") are
   quite cheap/horrible.  It's worth purchasing a replacement -- if it
   doesn't turn out to be the problem, you now have a spare PSU (which
   is good to have -- our last systems failure was due to a blown PSU).
5) Flaky RAM -- memtest86+ can help here, mostly but not entirely.
6) Flaky mainboard -- it happens.  Really.  :-)

For OS advice:

Compare rc.conf, loader.conf, and so on.  For example, is one system
using powerd(8) while the other isn't?

> If none of the above helps, we're thinking of swapping the CPUs
> between the two systems to see if the problems stay with the box or
> follow the CPU.

I was helping out someone on a public forum earlier this week who
purchased a Dell desktop system that started behaving oddly.  memtest86+
claimed all his DIMMs were bad (regardless of slot), and replacement
DIMMs claimed the same thing.  Dell kept insisting he reload the OS,
else they can try a motherboard swap, blah blah blah.  What amused me
was that nobody looked at the CPU: Intel Core i3-550, which contains an
on-die MCH.  Chances are the MCH is going bad, which means time to
replace the CPU.

CPUs rarely go bad, but now with on-die MCHs, on-die VGA, etc. it's
becoming much more plausible that the physical CPU needs to be replaced.
They've become practically computers inside of a computer.  :-)

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to