Hi, Over a large number of servers I am seeing regular failure of member disks in RAID1 arrays. The disks report as healthy by SMART (and indeed the attributes confirm this). There seems to be no explanation of why this is happening. In most cases the disk is removed, rediscovered and then the array is resynched successfully. Has anyone else seen similar problems?
The systems are running CentOS5 x86_64 Nov 23 23:13:29 kernel: mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1 id=8 Nov 23 23:13:29 kernel: mptbase: ioc0: PhysDisk is now missing Nov 23 23:13:29 kernel: mptsas: ioc0: removing sata device, channel 0, id 8, phy 0 Nov 23 23:13:29 kernel: mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1 id=8 Nov 23 23:13:29 kernel: mptbase: ioc0: PhysDisk is now missing, out of sync Nov 23 23:13:29 kernel: mptbase: ioc0: RAID STATUS CHANGE for VolumeID 0 Nov 23 23:13:29 kernel: mptbase: ioc0: volume is now degraded, enabled Nov 23 23:13:31 kernel: mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 1 id=8 Nov 23 23:13:31 kernel: mptbase: ioc0: PhysDisk is now online, out of sync Nov 23 23:13:31 kernel: mptsas: ioc0: attaching sata device, channel 0, id 8, phy 0 Nov 23 23:13:31 kernel: Vendor: ATA Model: ST3500320AS Rev: SD1A Nov 23 23:13:31 kernel: Type: Direct-Access ANSI SCSI revision: 05 Nov 23 23:13:33 kernel: scsi 0:0:2:0: Attached scsi generic sg0 type 0 Nov 23 23:13:34 kernel: mptbase: ioc0: RAID STATUS CHANGE for VolumeID 0 Nov 23 23:13:35 kernel: mptbase: ioc0: volume is now degraded, enabled, resync in progress In some instances this message is preceded by: Nov 19 11:55:25 kernel: command: Read(10): 28 00 07 e1 e7 59 00 00 08 00 Nov 19 11:55:25 kernel: mptscsih: ioc0: Issue of TaskMgmt failed! Nov 19 11:55:25 kernel: mptscsih: ioc0: task abort: FAILED (sc=ffff810114710e00) Nov 19 11:55:25 kernel: mptscsih: ioc0: attempting target reset! (sc=ffff81010eab6200) Nov 19 11:55:25 kernel: sd 0:1:0:0: (repeated 10-20 times) firmware: MPTFW-00.25.47.00-IE mptbase/mptsas driver: version: 3.04.07 This is the driver which ships with CentOS5 I have tried the driver version 4.00.38.02 [1] from Dell, but this does not help. Unfortunately Dell only provide drivers for RHEL Update3 (we're now at Update 4), and have advised me that Update 4 drivers may not be available until the new year at the earliest (how it takes ~3 months to release a driver I'm not really sure). LSI, perhaps understandably, refuse to correspond on this matter since this is an OEM board. [1] http://ftp.dell.com/sas-raid/R211003-mptlinux-4.00.38.02.txt Regards, Monty _______________________________________________ Linux-PowerEdge mailing list [email protected] https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
