I am having trouble with MPS becoming unresponsive in certain disk failure 
conditions. So far, I've experienced this with 3TB Hitachi disks (0S03208) and 
3TB Seagate Barracuda disks (ST3000DM001, firmware CC9D) while using the MPS 
driver with an LSI SAS2116 controller on FreeBSD 8.2-STABLE.

In these particular instances, the disks are part of a zpool of mirrors. When a 
disk fails, I generally see a message like "kernel: (da5:mps0:0:5:0): SCSI 
command timeout on device handle 0x0017 SMID 148", followed by an indefinite 
number of "mps0: (0:5:0) terminated ioc 804b scsi 0 state c xfer 65536" 
messages.

What I would want to happen in this case is for the disk to simply go offline 
in the zpool, in order for the pool to continue functioning. However, the pool 
status still shows the disk as online. Any attempts to disable the disk (such 
as with zpool offline, remove, or detach) will hang and never complete, as will 
attempting a rescan with camcontrol. Of course, any attempts to access data in 
the pool will hang as well.

Rebooting the system in this state is also bad; when the disk is first 
discovered, it will begin a cycle of mps scsi errors during startup that never 
seem to stop. The only way to recover, at least that I know of, is to 
physically remove the disk from the chassis. Once I do that, the system 
continues running perfectly.

Basically my question is this: How can I get MPS to ignore a failed disk and 
never attempt to access it again? I don't care if it does so automatically, or 
I if I need to perform some administrative operation to drop the device 
reference. I've seen a number of people on the list having problems that appear 
similar to this; but those seem more to do with firmware or compatibility 
issues. I my case, these disks are definitely dead... they no longer work in 
any other systems, and often make sad clicking noises.

I suppose this is also something that ZFS could do, independent of the driver. 
If a device is unresponsive, shouldn't it take it offline on it's own?

        - .Dustin

_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"

Reply via email to