On Fri, Aug 22, 2003 at 10:13:30AM -0700, Steve Warwick wrote: > I have just had my 2nd HD crash in a year - different machine, different > type of drive etc - the only consistency is the OS and the installed > software.
I think that you've just been unlucky in buying two hard drives or whatever which have failed prematurely. I think if there was such a bug in FreeBSD, a lot more people than you would be complaining and heaven and earth would be being moved in order to fix it. > It looks like this is some kind of overflow???? The only time I ever saw something similar, it was a Linux box and one of the onboard chipsets (northbridge, southbridge -- I can never remember which is which) tended to overheat in the particular customized chassis those machine came in. In about 4--5 machines out of 80, the boxes developed a failure mode where the hard drive would be rather scrambled, with portions of one file appearing mixed into other files, and the whole FS garbaged. Swapping out the broken motherboards and fitting a suitable heat-sink sorted out the problem. > This problem seems to start after about a month and is indicated by there > being fragments of the kernel config data in the daily kernel log messages. > I asked about that on this list but people seemed to think it was just some > kind of log rotation. > > I caught the machine a couple of months ago with nearly all the swap used > (800meg out of 1 gig) and rebooted which kept the machine happy. That can be an indication of a disk on the verge of failure. Were there any other indications? Messages on the console? Unexplained program core dumps? > I recently noticed the kernel log messages had the config fragments again > and was going to to a reboot - but alas too late. The machine is down, HD is > damaged and we are trying to get data off the drive... > > This is exactly the problem that occurred 6 months ago. That time I put it > down to lousy overheated hosting but now I don't have that excuse. > > Has anyone seen this before? > > Does FreeBSD have to be re-booted every month for safety? > > Should I give up and use Linux? > > A frustrated sysadmin Yup. I know the feeling. However, I don't think you can blame FreeBSD for the problem here -- it smells of hardware failure to me. This could certainly be the result of overheating even if you have moved the machine to a better environment. Some hardware is more equally ventilated than others. Heat stress is exactly the sort of thing that can cause a machine to go tits-up after a few months. Another candidate is electrostatic discharge, but unless you're in the habit of opening up the box and working on the innards without using a proper ground strap, that's probably not a concern. Besides, ESD doesn't usually destroy hard drives. Cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. 26 The Paddocks Savill Way PGP: http://www.infracaninophile.co.uk/pgpkey Marlow Tel: +44 1628 476614 Bucks., SL7 1TH UK
Description: PGP signature