HI all, Having some odd issues with our internal storage on our R900. We get hundreds of SCSI sense key errors reported on all the disks all day long.... Eventually we'll get I/O errors a few times a week and the drives go offline and system crashes.
Here are some examples: (just prior to crash) May 10 02:05:57 mackey kernel: megasas: [20]waiting for 127 commands to complete May 10 02:06:02 mackey kernel: megasas: [25]waiting for 127 commands to complete May 10 02:06:07 mackey kernel: megasas: [30]waiting for 127 commands to complete May 10 02:08:38 mackey kernel: megasas[0]: Frame addr :0xbfaa4800 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x1, lba l o : 0x63c01bf, lba_hi : 0x0, sense_buf addr : 0x37f47b00,sge count : 0x1 May 10 02:08:38 mackey kernel: May 10 02:08:38 mackey kernel: megasas[0]: Frame addr :0xbfaa4c00 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x1, lba l o : 0x63a991f, lba_hi : 0x0, sense_buf addr : 0x37f47b80,sge count : 0x1 May 10 02:08:38 mackey kernel: megasas[0]: Pending Internal cmds in FW : May 10 02:08:38 mackey kernel: megasas[0]: Dumping Done. May 10 02:08:38 mackey kernel: May 10 02:08:38 mackey kernel: megasas: failed to do reset May 10 02:08:38 mackey kernel: sd 0:2:1:0: megasas: RESET -264942026 cmd=2a retries=0 May 10 02:08:38 mackey kernel: megasas: cannot recover from previous reset failures May 10 02:08:38 mackey kernel: sd 0:2:1:0: megasas: RESET -264942026 cmd=2a retries=0 May 10 02:08:38 mackey kernel: sd 0:2:0:0: scsi: Device offlined - not ready after error recovery May 10 02:08:38 mackey kernel: sd 0:2:1:0: scsi: Device offlined - not ready after error recovery May 10 02:08:38 mackey last message repeated 106 times May 10 02:08:38 mackey kernel: sd 0:2:1:0: timing out command, waited 360s May 10 02:08:38 mackey kernel: sd 0:2:1:0: SCSI error: return code = 0x06000000 May 10 02:08:38 mackey kernel: end_request: I/O error, dev sdb, sector 111033375 May 10 02:08:38 mackey kernel: Buffer I/O error on device dm-5, logical block 13879116 May 10 02:08:38 mackey kernel: lost page write due to I/O error on dm-5 May 10 02:08:38 mackey kernel: Buffer I/O error on device dm-5, logical block 13879117 May 10 02:08:38 mackey kernel: lost page write due to I/O error on dm-5 May 10 02:08:38 mackey kernel: Buffer I/O error on device dm-5, logical block 13879118 (dozens more of the IO errors.... ) May 10 02:08:38 mackey kernel: sd 0:2:1:0: timing out command, waited 360s May 10 02:08:38 mackey kernel: sd 0:2:1:0: SCSI error: return code = 0x06000000 May 10 02:08:38 mackey kernel: end_request: I/O error, dev sdb, sector 111033767 May 10 02:08:38 mackey kernel: sd 0:2:1:0: timing out command, waited 360s May 10 02:08:38 mackey kernel: sd 0:2:0:0: rejecting I/O to offline device May 10 02:08:38 mackey kernel: sd 0:2:0:0: rejecting I/O to offline device May 10 02:08:38 mackey kernel: Aborting journal on device dm-6. May 10 02:08:38 mackey kernel: sd 0:2:0:0: rejecting I/O to offline device May 10 02:08:38 mackey kernel: EXT3-fs error (device dm-6): read_block_bitmap: Cannot read block bitmap - block_group = 53, block_bi tmap = 1736704 May 10 02:08:38 mackey kernel: sd 0:2:1:0: rejecting I/O to offline device May 10 02:08:38 mackey kernel: ext3_abort called. May 10 02:08:38 mackey kernel: EXT3-fs error (device dm-6): ext3_journal_start_sb: Detected aborted journal May 10 02:08:38 mackey kernel: Remounting filesystem read-only May 10 02:08:38 mackey kernel: Aborting journal on device dm-5. May 10 02:08:38 mackey kernel: sd 0:2:1:0: rejecting I/O to offline device May 10 02:08:38 mackey kernel: __journal_remove_journal_head: freeing b_committed_data May 10 02:08:38 mackey kernel: sd 0:2:0:0: rejecting I/O to offline device May 10 02:08:38 mackey last message repeated 3 times And the death throws just prior to crashing/reboot. (obviously the clock is off here.... Need to fix that. :) ) May 10 02:08:38 mackey kernel: sd 0:2:1:0: timing out command, waited 360s May 10 02:08:38 mackey kernel: sd 0:2:1:0: SCSI error: return code = 0x06000000 May 10 02:08:38 mackey kernel: end_request: I/O error, dev sdb, sector 104737343 May 10 02:08:38 mackey kernel: sd 0:2:1:0: timing out command, waited 360s May 10 02:08:38 mackey kernel: sd 0:2:1:0: SCSI error: return code = 0x06000000 May 10 02:08:38 mackey kernel: end_request: I/O error, dev sdb, sector 104745543 May 10 02:08:44 mackey kernel: sd 0:2:1:0: rejecting I/O to offline device May 10 02:08:44 mackey kernel: printk: 1724 messages suppressed. May 10 02:08:44 mackey kernel: Buffer I/O error on device dm-5, logical block 13094162 May 10 02:08:44 mackey kernel: lost page write due to I/O error on dm-5 May 10 02:08:44 mackey kernel: sd 0:2:1:0: rejecting I/O to offline device May 9 22:27:02 mackey syslogd 1.4.1: restart. May 9 22:27:02 mackey kernel: klogd 1.4.1, log source = /proc/kmsg started. May 9 22:27:02 mackey kernel: Linux version 2.6.18-128.1.10.el5 ([email protected]<mailto:[email protected]>) (gcc version 4.1.2 20080704 (Red H at 4.1.2-44)) #1 SMP Thu May 7 10:35:59 EDT 2009 (after reboot... this is what shows most of the day) May 10 02:35:52 mackey Server Administrator: Storage Service EventID: 2095 SCSI sense data Sense key: B Sense code: 4B Sense quali fier: 4: Physical Disk 1:0:13 Controller 0, Connector 1 May 10 02:35:54 mackey Server Administrator: Storage Service EventID: 2095 SCSI sense data Sense key: B Sense code: 4B Sense quali fier: 4: Physical Disk 1:0:2 Controller 0, Connector 1 May 10 02:35:54 mackey Server Administrator: Storage Service EventID: 2095 SCSI sense data Sense key: B Sense code: 4B Sense quali fier: 4: Physical Disk 1:0:1 Controller 0, Connector 1 Any ideas what could be causing this? This is all internal disk, nothing external. CentOS 5.4 kernel 2.6.18-128.1.10.el5 #1 SMP Thu May 7 10:35:59 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux Thanks, --Chris Christopher M. Trainor Manager, IT & Network Operations Quick Hit, Inc. o. 508.203.4857 w. www.quickhit.com
_______________________________________________ Linux-PowerEdge mailing list [email protected] https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
