Hi,

Over a large number of servers I am seeing regular failure of member
disks in RAID1 arrays. The disks report as healthy by SMART (and
indeed the attributes confirm this). There seems to be no explanation
of why this is happening. In most cases the disk is removed,
rediscovered and then the array is resynched successfully. Has anyone
else seen similar problems?

The systems are running CentOS5 x86_64

Nov 23 23:13:29 kernel: mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1 id=8
Nov 23 23:13:29 kernel: mptbase: ioc0:   PhysDisk is now missing
Nov 23 23:13:29 kernel: mptsas: ioc0: removing sata device, channel 0,
id 8, phy 0
Nov 23 23:13:29 kernel: mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1 id=8
Nov 23 23:13:29 kernel: mptbase: ioc0:   PhysDisk is now missing, out of sync
Nov 23 23:13:29 kernel: mptbase: ioc0: RAID STATUS CHANGE for VolumeID 0
Nov 23 23:13:29 kernel: mptbase: ioc0:   volume is now degraded, enabled
Nov 23 23:13:31 kernel: mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1 id=8
Nov 23 23:13:31 kernel: mptbase: ioc0:   PhysDisk is now online, out of sync
Nov 23 23:13:31 kernel: mptsas: ioc0: attaching sata device, channel
0, id 8, phy 0
Nov 23 23:13:31 kernel:   Vendor: ATA       Model: ST3500320AS       Rev: SD1A
Nov 23 23:13:31 kernel:   Type:   Direct-Access
ANSI SCSI revision: 05
Nov 23 23:13:33 kernel: scsi 0:0:2:0: Attached scsi generic sg0 type 0
Nov 23 23:13:34 kernel: mptbase: ioc0: RAID STATUS CHANGE for VolumeID 0
Nov 23 23:13:35 kernel: mptbase: ioc0:   volume is now degraded,
enabled, resync in progress

In some instances this message is preceded by:

Nov 19 11:55:25 kernel:         command: Read(10): 28 00 07 e1 e7 59 00 00 08 00
Nov 19 11:55:25 kernel: mptscsih: ioc0: Issue of TaskMgmt failed!
Nov 19 11:55:25 kernel: mptscsih: ioc0: task abort: FAILED (sc=ffff810114710e00)
Nov 19 11:55:25 kernel: mptscsih: ioc0: attempting target reset!
(sc=ffff81010eab6200)
Nov 19 11:55:25 kernel: sd 0:1:0:0:
(repeated 10-20 times)

firmware: MPTFW-00.25.47.00-IE

mptbase/mptsas driver:
version:        3.04.07
This is the driver which ships with CentOS5

I have tried the driver version 4.00.38.02 [1] from Dell, but this
does not help. Unfortunately Dell only provide drivers for RHEL
Update3 (we're now at Update 4), and have advised me that Update 4
drivers may not be available until the new year at the earliest (how
it takes ~3 months to release a driver I'm not really sure).

LSI, perhaps understandably, refuse to correspond on this matter since
this is an OEM board.

[1] http://ftp.dell.com/sas-raid/R211003-mptlinux-4.00.38.02.txt

Regards,
Monty

_______________________________________________
Linux-PowerEdge mailing list
[email protected]
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq

Reply via email to