John Almberg wrote:

> Thanks for all the tips. At least I have something to start with.
> The guys in the data center reinstalled FreeBSD (the filesystem was
> totally corrupted again), and then ran what they called "SMART test",
> which might be smartctl, and said the hard drives look good.
> I am now able to get back in.
> So the system ran fine until I put a load on it with the database
> (many transactions a second). This corrupted the file system again.
> So I guess I need to load it enough to produce error messages
> (hopefully) but not enough to destroy the file system again.
> Sounds like fun :-(
> This is an Intel server, not a crummy white box, so hopefully it is
> smart enough to monitor its own hardware at least a bit. We'll see.

Just a tidbit or two. If it has an ICHR type South Bridge with what Intel 
calls "Matrix RAID" there has been reported problems with trying to use the 
RAID functionality. If you are not using the RAID make sure the data center 
guys are turning this off in the BIOS. 

Whenever I see these kinds of reports about data corruption correlating with 
SMART saying the drives are "good" I think disk controller. It does seem 
strange if the problem was not present previous to the "power fluctuations". 
But where hardware damage occurs can be funky. At least with the box I once 
had that took a direct lightning strike it was interesting to see where the 
lightening bounced around inside.

If this is a 1u pizza box with only one power supply I would suspect the 
power supply of being damaged from the power problem. If it is a relatively 
low wattage unit then the damage sustained has created a situation where it 
doesn't have enough overhead to provide regulated pure DC when under full 

I remember a software company I worked for a few years stuck the old WORM 
drives in an HP Vectra desktop that only had a 135 watt power supply. You 
could see the power go all wonky with an oscilloscope as soon as that WORM 
drive started up, but the box worked well up until this point. 

At any rate, this all sounds like hardware to me. If it wasn't doing any of 
this before the so-called "power event" then I believe there has been 
hardware damage. Unless you are co-locating your own hardware it is the 
responsibility of the data center to provide you with functional hardware. 
After the first go around and the same problem resurfacing they should have 
yanked the box and just replaced it. Put a good one in service and 
troubleshoot the bad one off line. If they can't hold up their end of the 
deal you need to be looking somewhere else.


_______________________________________________ mailing list
To unsubscribe, send any mail to ""

Reply via email to