Matthew Hagerty wrote:
Alex Zbyslaw wrote:
Matthew Hagerty wrote:
Can anyone shed some light on this, give me some options to try?
What happened to kernel panics and such when there were serious
errors going on? The only glimmer of information I have is that
*one* time there was an error on the console about there not being
any RAID controller available. I did purchase a spare controller
and I'm about to swap it out and see if it helps, but for some
reason I doubt it. If a controller like that was failing, I would
certainly hope to see some serious error messages or panics going on.
I have been running FreeBSD since version 1.01 and have never had a
box so unstable in the last 12 or so years, especially one that is
supposed to be "server" quality instead of the make-shift ones I put
together with desktop hardware. And last, I'm getting sick of my
Linux admin friends telling me "told you so! should have run
Linux...", please give me something to stick in their pie holes!
Several times now I have had Linux servers (and production quality
ones, not built by me ones :-)) die in a somewhat similar fashion.
In every case the cause has been either a flaky disk or a flaky disk
controller, or some combination.
What seems to happen is that the disk is entirely "lost" by the OS.
At that point any process which never accesses the disk (i.e. is
already in memory) is able to run but the moment any process tries to
access the disk it locks up. So you can't ssh in to the server, but
if you happen to be logged in, you shell is probably cached and keeps
working. If you typed ls recently, you can run ls (but see nothing
or get a cryptic error message like I/O Error), for example.
Hmm, that just seems odd that a disk controller just vanishing would
not cause some sort of console message? Even if the disk device is
gone, /dev/console should still be intact to display an error, no?
Also, a disk device that is all of a sudden missing seems pretty
serious to me, since a disk is one of the main devices that modern
OSes cannot run without (generally speaking.) I would think *some*
console message should be warranted.
Not if syslogd tries to access the disk :-( All can say is that I have
seen three Linux boxes go this way; I've never had this kind of failure
on a BSD box (touch wood) so all I can do is speculate about the
similarities. Also, you did get a console message once, didn't you?
I'll see if there are any diag programs for the controller and I'll go
ahead and swap the controller out. I wonder if the RAID configuration
in stored in the controller or on the disks? I'd hate to have to
rebuild the server install...
I believe both and the RAID controller will compare what it thinks it
should see with what it sees on the disks.
If you are moving to a new, identical controller I would have thought
that the worst you would have to do is to reconfigure it to accept the
disks you give it as your specified configuration without it trying to
rebuild anything.
hth,
--Alex
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"