> > Derek Ragona wrote: > > At 09:00 AM 11/14/2007, Barnaby Scott wrote: > >> I suspect I already know the answer to this, which is that the > >> trouble I am having is nothing to do with the OS at all, > but I have > >> to ask, because I am otherwise up against a total brick wall! > >> > >> I bought a second-hand Dell Poweredge 4600 and installed > FreeBSD 6.2 > >> earlier this year. I had it set up with RAID5 using its PERC3/DC > >> controller, with 7 x 73GB disks (+ 1 hot spare). So far so > good, and > >> it worked faultlessly as a Samba server for several months. > >> > >> At the beginning of October, it went down, reporting a mismatch > >> between the configuration on the NVRAM and the disks. With > help from > >> Dell support, I managed to recreate the RAID array and it worked > >> again for a month. > >> > >> In early November it happened again, and has kept > happening since. At > >> one point it appeared that the backplane was faulty, so I replaced > >> that, but I cannot keep the server up for more than a day or so > >> without this 'mismatch' poblem. > >> > >> What about diagnostics on the hardware you may ask? I have run all > >> the diagnostic tools that Dell can supply - several times > - and the > >> server declares itself to be totally fault-free. > >> > >> My specific questions therefore: > >> > >> Is there any way at all that FreeBSD could be invloved with this > >> problem? (I did notice for example that the Dell PERC3/DC > controller > >> was not in the list of supported hardware - but then > again, why did > >> it work for several months?) > >> > >> Can I use FreeBSD to tell me anything about the fault that Dell's > >> diagnostic tools haven't found? > >> > >> (I do hope someone might be able to help - Dell are trying > to get me > >> to switch to a 'supported' OS!) > >> > >> > >> Thanks > >> > >> Barnaby Scott > > > > It doesn't sound like any OS issue as you set up the RAID > outside the > > OS. It may be a bad drive or drive(s). Most RAID drives have RAID > > information written to the drives, and if this becomes > unreadable you > > will have RAID faults. > > > > Another likely culprit is heat. Overheating drives often > fail. Are > > you sure the temperatures in the drive enclosure is OK? > > > > If you can, run diagnostics on the drives, this usually requires > > running these with the drives taken out of the RAID array though. > > > > -Derek > > > > Thanks for replying - as I said, this is a long shot trying > to see if there is any OS involvement. > > The drives are fine - I have used two different tools to > analyse them while the computer is booted from a live CD and > the RAID configuration cleared on the controller. Besides, > you would expect one drive to fail at a time, and if this > happened, the hot spare would surely be pressed into service. > Nothing like this has happened though - the controller is > reporting several drives (not always the same ones) failed > simultaneously, but when the array is re-created from the > disks, everything works fine. Problem is, it goes down again > a day or so later. > > As for heat, there is nothing being reported there and the > fans that cool that area are working. > > Any other ideas gratefully received! > > Barnaby Scott
This is very unlikely to be OS related. But here are few pointers: 1) Check the make/model of the drives. Certain types of make/model SCSI drives had a glitch in them a while ago with a certain firmware that they'd disconnect from a RAID. I had a personal experience with these ones (Seagate U320). 2) What did happen in October? Anything hardware, software, power wise has occurred ? 3) NVRAM and Disk mismatch, I'd say check the controller, backup battery present but weak ? 4) Unlikely to be the source, but run a test on your physical RAM using MEMTEST86+ and check the power supply is sufficient and working properly. _______________________________________________ email@example.com mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"