John Almberg wrote:
> Thanks for all the tips. At least I have something to start with.
> The guys in the data center reinstalled FreeBSD (the filesystem was
> totally corrupted again), and then ran what they called "SMART test",
> which might be smartctl, and said the hard drives look good.
> I am now able to get back in.
> So the system ran fine until I put a load on it with the database
> (many transactions a second). This corrupted the file system again.
> So I guess I need to load it enough to produce error messages
> (hopefully) but not enough to destroy the file system again.
> Sounds like fun :-(
> This is an Intel server, not a crummy white box, so hopefully it is
> smart enough to monitor its own hardware at least a bit. We'll see.
Just a tidbit or two. If it has an ICHR type South Bridge with what Intel
calls "Matrix RAID" there has been reported problems with trying to use the
RAID functionality. If you are not using the RAID make sure the data center
guys are turning this off in the BIOS.
Whenever I see these kinds of reports about data corruption correlating with
SMART saying the drives are "good" I think disk controller. It does seem
strange if the problem was not present previous to the "power fluctuations".
But where hardware damage occurs can be funky. At least with the box I once
had that took a direct lightning strike it was interesting to see where the
lightening bounced around inside.
If this is a 1u pizza box with only one power supply I would suspect the
power supply of being damaged from the power problem. If it is a relatively
low wattage unit then the damage sustained has created a situation where it
doesn't have enough overhead to provide regulated pure DC when under full
I remember a software company I worked for a few years stuck the old WORM
drives in an HP Vectra desktop that only had a 135 watt power supply. You
could see the power go all wonky with an oscilloscope as soon as that WORM
drive started up, but the box worked well up until this point.
At any rate, this all sounds like hardware to me. If it wasn't doing any of
this before the so-called "power event" then I believe there has been
hardware damage. Unless you are co-locating your own hardware it is the
responsibility of the data center to provide you with functional hardware.
After the first go around and the same problem resurfacing they should have
yanked the box and just replaced it. Put a good one in service and
troubleshoot the bad one off line. If they can't hold up their end of the
deal you need to be looking somewhere else.
firstname.lastname@example.org mailing list
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"