On Mon, Apr 13, 2009 at 12:07:25PM -0400, John Almberg wrote: > I have what looks like a hardware problem with an Intel 1U server, > which I am using mainly as a mysql database server for some of my > bigger website clients. > > The server went down last week with a badly corrupted file system. > > After spending a day trying to fix the file system, we gave up and > did a fresh install of FreeBSD, PF, and mysql, using our daily > backups to restore the database. It all seemed to work fine until I > switched the websites from the temporary database server that I had > been using, onto the restored server. > > The database ran well for about 2 minutes, then the server crashed > again. The filesystem was again corrupted so badly that we could not > even log in to look at the logs. > > We've reinstalled FreeBSD again, just to be able to SSH into the box. > It looks like there is probably a hardware problem, like a bad power > supply or overheating CPU that fails when the load of the database is > applied. > > Problem is, I have no idea how to determine which bits are failing. > Can anyone suggest a favorite book or website that focuses on how to > troubleshoot hardware issues?
First things first; if the machine is still in warranty, don't mess with it but send it back to the manufacturer and demand a replacement. If the machine is out of warranty, you might consider replacing it altogether. My employer's IT department ditches PC's and servers at the first failure after the warranty runs out. Accordinf to them it's cheaper than repairing them. But if you want to have a go, this might help: http://www.daileyint.com/hmdpc/manual.htm Basically, it's just a problem of elimination. First check if your machine is the only one having problems at the hosting site. Maybe they have unstable electrical power. Then make sure that all expansion cards and RAM are well-seated, and that all connectors are OK. Also check that there is no dust build-up on e.g. fans and heatsinks. If necessary, clean carefully with (dry, oil free) compressed air. Dust can lead to short circuits or reduced cooling. Next, look for capacitors that have leaked fluid, or have bulging metal end plates on the motherboard; those are dead or dying. It's a leading cause of motherboard failure. It is possible to replace them, but you'll need the right equipment: http://www.tomshardware.com/reviews/fixing-motherboard,1606.html Install a monitoring program like mbmon or healthd, and have it log to another machine or a USB stick mounted syncronously. Monitor CPU temperature, fan speeds and the different voltages. Not all power supplies are created equally. See the articles at tom's hardware: http://www.tomshardware.com/reviews/Components,1/Power-Supplies,6/ If you've found nothing so far, it's time to start swapping out components, starting with the power supply. Roland -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)
pgpOGV68CCS4P.pgp
Description: PGP signature