Derek Ragona wrote:
At 09:00 AM 11/14/2007, Barnaby Scott wrote:
I suspect I already know the answer to this, which is that the trouble
I am having is nothing to do with the OS at all, but I have to ask,
because I am otherwise up against a total brick wall!
I bought a second-hand Dell Poweredge 4600 and installed FreeBSD 6.2
earlier this year. I had it set up with RAID5 using its PERC3/DC
controller, with 7 x 73GB disks (+ 1 hot spare). So far so good, and
it worked faultlessly as a Samba server for several months.
At the beginning of October, it went down, reporting a mismatch
between the configuration on the NVRAM and the disks. With help from
Dell support, I managed to recreate the RAID array and it worked again
for a month.
In early November it happened again, and has kept happening since. At
one point it appeared that the backplane was faulty, so I replaced
that, but I cannot keep the server up for more than a day or so
without this 'mismatch' poblem.
What about diagnostics on the hardware you may ask? I have run all the
diagnostic tools that Dell can supply - several times - and the server
declares itself to be totally fault-free.
My specific questions therefore:
Is there any way at all that FreeBSD could be invloved with this
problem? (I did notice for example that the Dell PERC3/DC controller
was not in the list of supported hardware - but then again, why did it
work for several months?)
Can I use FreeBSD to tell me anything about the fault that Dell's
diagnostic tools haven't found?
(I do hope someone might be able to help - Dell are trying to get me
to switch to a 'supported' OS!)
It doesn't sound like any OS issue as you set up the RAID outside the
OS. It may be a bad drive or drive(s). Most RAID drives have RAID
information written to the drives, and if this becomes unreadable you
will have RAID faults.
Another likely culprit is heat. Overheating drives often fail. Are you
sure the temperatures in the drive enclosure is OK?
If you can, run diagnostics on the drives, this usually requires running
these with the drives taken out of the RAID array though.
Thanks for replying - as I said, this is a long shot trying to see if
there is any OS involvement.
The drives are fine - I have used two different tools to analyse them
while the computer is booted from a live CD and the RAID configuration
cleared on the controller. Besides, you would expect one drive to fail
at a time, and if this happened, the hot spare would surely be pressed
into service. Nothing like this has happened though - the controller is
reporting several drives (not always the same ones) failed
simultaneously, but when the array is re-created from the disks,
everything works fine. Problem is, it goes down again a day or so later.
As for heat, there is nothing being reported there and the fans that
cool that area are working.
Any other ideas gratefully received!
email@example.com mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"