On Mon, Apr 13, 2009 at 12:07:25PM -0400, John Almberg wrote:
> I have what looks like a hardware problem with an Intel 1U server,  
> which I am using mainly as a mysql database server for some of my  
> bigger website clients.
> The server went down last week with a badly corrupted file system.
> After spending a day trying to fix the file system, we gave up and  
> did a fresh install of FreeBSD, PF, and mysql, using our daily  
> backups to restore the database. It all seemed to work fine until I  
> switched the websites from the temporary database server that I had  
> been using, onto the restored server.
> The database ran well for about 2 minutes, then the server crashed  
> again. The filesystem was again corrupted so badly that we could not  
> even log in to look at the logs.
> We've reinstalled FreeBSD again, just to be able to SSH into the box.  
> It looks like there is probably a hardware problem, like a bad power  
> supply or overheating CPU that fails when the load of the database is  
> applied.
> Problem is, I have no idea how to determine which bits are failing.  
> Can anyone suggest a favorite book or website that focuses on how to  
> troubleshoot hardware issues?

First things first; if the machine is still in warranty, don't mess with
it but send it back to the manufacturer and demand a replacement.

If the machine is out of warranty, you might consider replacing it
altogether. My employer's IT department ditches PC's and servers at the first
failure after the warranty runs out. Accordinf to them it's cheaper than
repairing them.

But if you want to have a go, this might help:

Basically, it's just a problem of elimination.

First check if your machine is the only one having problems at the
hosting site. Maybe they have unstable electrical power.

Then make sure that all expansion cards and RAM are well-seated, and
that all connectors are OK. Also check that there is no dust build-up on
e.g. fans and heatsinks. If necessary, clean carefully with (dry, oil
free) compressed air. Dust can lead to short circuits or reduced
cooling. Next, look for capacitors that have leaked fluid, or have
bulging metal end plates on the motherboard; those are dead or
dying. It's a leading cause of motherboard failure. It is possible to
replace them, but you'll need the right equipment:

Install a monitoring program like mbmon or healthd, and have it log to
another machine or a USB stick mounted syncronously. Monitor CPU
temperature, fan speeds and the different voltages. Not all power
supplies are created equally. See the articles at tom's hardware:

If you've found nothing so far, it's time to start swapping out
components, starting with the power supply.

R.F.Smith                                   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)

Attachment: pgpOGV68CCS4P.pgp
Description: PGP signature

Reply via email to