On Fri, Aug 22, 2003 at 10:13:30AM -0700, Steve Warwick wrote:

> I have just had my 2nd HD crash in a year - different machine, different
> type of drive etc - the only consistency is the OS and the installed
> software.

I think that you've just been unlucky in buying two hard drives or
whatever which have failed prematurely.  I think if there was such a
bug in FreeBSD, a lot more people than you would be complaining and
heaven and earth would be being moved in order to fix it.
 
> It looks like this is some kind of overflow????

The only time I ever saw something similar, it was a Linux box and one
of the onboard chipsets (northbridge, southbridge -- I can never
remember which is which) tended to overheat in the particular
customized chassis those machine came in.  In about 4--5 machines out
of 80, the boxes developed a failure mode where the hard drive would
be rather scrambled, with portions of one file appearing mixed into
other files, and the whole FS garbaged.  Swapping out the broken
motherboards and fitting a suitable heat-sink sorted out the problem.
 
> This problem seems to start after about a month and is indicated by there
> being fragments of the kernel config data in the daily kernel log messages.
> I asked about that on this list but people seemed to think it was just some
> kind of log rotation.
> 
> I caught the machine a couple of months ago with nearly all the swap used
> (800meg out of 1 gig) and rebooted which kept the machine happy.

That can be an indication of a disk on the verge of failure.  Were
there any other indications? Messages on the console?  Unexplained
program core dumps?

> I recently noticed the kernel log messages had the config fragments again
> and was going to to a reboot - but alas too late. The machine is down, HD is
> damaged and we are trying to get data off the drive...
> 
> This is exactly the problem that occurred 6 months ago. That time I put it
> down to lousy overheated hosting but now I don't have that excuse.
> 
> Has anyone seen this before?
> 
> Does FreeBSD have to be re-booted every month for safety?
> 
> Should I give up and use Linux?
> 
> A frustrated sysadmin

Yup.  I know the feeling.  However, I don't think you can blame
FreeBSD for the problem here -- it smells of hardware failure to me.
This could certainly be the result of overheating even if you have
moved the machine to a better environment. Some hardware is more
equally ventilated than others. Heat stress is exactly the sort of
thing that can cause a machine to go tits-up after a few months.
Another candidate is electrostatic discharge, but unless you're in the
habit of opening up the box and working on the innards without using a
proper ground strap, that's probably not a concern. Besides, ESD
doesn't usually destroy hard drives.

        Cheers,

        Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.                       26 The Paddocks
                                                      Savill Way
PGP: http://www.infracaninophile.co.uk/pgpkey         Marlow
Tel: +44 1628 476614                                  Bucks., SL7 1TH UK

Attachment: pgp00000.pgp
Description: PGP signature

Reply via email to