HI all,

Having some odd issues with our internal storage on our R900.    We get 
hundreds of SCSI sense key errors reported on all the disks all day long.... 
Eventually we'll get I/O errors a few times a week and the drives go offline 
and system crashes.

Here are some examples:
(just prior to crash)

May 10 02:05:57 mackey kernel: megasas: [20]waiting for 127 commands to complete
May 10 02:06:02 mackey kernel: megasas: [25]waiting for 127 commands to complete
May 10 02:06:07 mackey kernel: megasas: [30]waiting for 127 commands to complete


May 10 02:08:38 mackey kernel: megasas[0]: Frame addr :0xbfaa4800 : 
<3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x1, lba l
o : 0x63c01bf, lba_hi : 0x0, sense_buf addr : 0x37f47b00,sge count : 0x1
May 10 02:08:38 mackey kernel:
May 10 02:08:38 mackey kernel: megasas[0]: Frame addr :0xbfaa4c00 : 
<3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x1, lba l
o : 0x63a991f, lba_hi : 0x0, sense_buf addr : 0x37f47b80,sge count : 0x1


May 10 02:08:38 mackey kernel: megasas[0]: Pending Internal cmds in FW :
May 10 02:08:38 mackey kernel: megasas[0]: Dumping Done.
May 10 02:08:38 mackey kernel:
May 10 02:08:38 mackey kernel: megasas: failed to do reset
May 10 02:08:38 mackey kernel: sd 0:2:1:0: megasas: RESET -264942026 cmd=2a 
retries=0
May 10 02:08:38 mackey kernel: megasas: cannot recover from previous reset 
failures
May 10 02:08:38 mackey kernel: sd 0:2:1:0: megasas: RESET -264942026 cmd=2a 
retries=0

May 10 02:08:38 mackey kernel: sd 0:2:0:0: scsi: Device offlined - not ready 
after error recovery
May 10 02:08:38 mackey kernel: sd 0:2:1:0: scsi: Device offlined - not ready 
after error recovery
May 10 02:08:38 mackey last message repeated 106 times


May 10 02:08:38 mackey kernel: sd 0:2:1:0: timing out command, waited 360s
May 10 02:08:38 mackey kernel: sd 0:2:1:0: SCSI error: return code = 0x06000000
May 10 02:08:38 mackey kernel: end_request: I/O error, dev sdb, sector 111033375
May 10 02:08:38 mackey kernel: Buffer I/O error on device dm-5, logical block 
13879116
May 10 02:08:38 mackey kernel: lost page write due to I/O error on dm-5
May 10 02:08:38 mackey kernel: Buffer I/O error on device dm-5, logical block 
13879117
May 10 02:08:38 mackey kernel: lost page write due to I/O error on dm-5
May 10 02:08:38 mackey kernel: Buffer I/O error on device dm-5, logical block 
13879118
(dozens more of the IO errors.... )

May 10 02:08:38 mackey kernel: sd 0:2:1:0: timing out command, waited 360s
May 10 02:08:38 mackey kernel: sd 0:2:1:0: SCSI error: return code = 0x06000000
May 10 02:08:38 mackey kernel: end_request: I/O error, dev sdb, sector 111033767
May 10 02:08:38 mackey kernel: sd 0:2:1:0: timing out command, waited 360s

May 10 02:08:38 mackey kernel: sd 0:2:0:0: rejecting I/O to offline device
May 10 02:08:38 mackey kernel: sd 0:2:0:0: rejecting I/O to offline device
May 10 02:08:38 mackey kernel: Aborting journal on device dm-6.
May 10 02:08:38 mackey kernel: sd 0:2:0:0: rejecting I/O to offline device
May 10 02:08:38 mackey kernel: EXT3-fs error (device dm-6): read_block_bitmap: 
Cannot read block bitmap - block_group = 53, block_bi
tmap = 1736704
May 10 02:08:38 mackey kernel: sd 0:2:1:0: rejecting I/O to offline device
May 10 02:08:38 mackey kernel: ext3_abort called.
May 10 02:08:38 mackey kernel: EXT3-fs error (device dm-6): 
ext3_journal_start_sb: Detected aborted journal
May 10 02:08:38 mackey kernel: Remounting filesystem read-only
May 10 02:08:38 mackey kernel: Aborting journal on device dm-5.
May 10 02:08:38 mackey kernel: sd 0:2:1:0: rejecting I/O to offline device
May 10 02:08:38 mackey kernel: __journal_remove_journal_head: freeing 
b_committed_data
May 10 02:08:38 mackey kernel: sd 0:2:0:0: rejecting I/O to offline device
May 10 02:08:38 mackey last message repeated 3 times

And the death throws just prior to crashing/reboot.  (obviously the clock is 
off here....  Need to fix that. :) )

May 10 02:08:38 mackey kernel: sd 0:2:1:0: timing out command, waited 360s
May 10 02:08:38 mackey kernel: sd 0:2:1:0: SCSI error: return code = 0x06000000
May 10 02:08:38 mackey kernel: end_request: I/O error, dev sdb, sector 104737343
May 10 02:08:38 mackey kernel: sd 0:2:1:0: timing out command, waited 360s
May 10 02:08:38 mackey kernel: sd 0:2:1:0: SCSI error: return code = 0x06000000
May 10 02:08:38 mackey kernel: end_request: I/O error, dev sdb, sector 104745543
May 10 02:08:44 mackey kernel: sd 0:2:1:0: rejecting I/O to offline device
May 10 02:08:44 mackey kernel: printk: 1724 messages suppressed.
May 10 02:08:44 mackey kernel: Buffer I/O error on device dm-5, logical block 
13094162
May 10 02:08:44 mackey kernel: lost page write due to I/O error on dm-5
May 10 02:08:44 mackey kernel: sd 0:2:1:0: rejecting I/O to offline device
May  9 22:27:02 mackey syslogd 1.4.1: restart.
May  9 22:27:02 mackey kernel: klogd 1.4.1, log source = /proc/kmsg started.
May  9 22:27:02 mackey kernel: Linux version 2.6.18-128.1.10.el5 
([email protected]<mailto:[email protected]>) (gcc 
version 4.1.2 20080704 (Red H
at 4.1.2-44)) #1 SMP Thu May 7 10:35:59 EDT 2009

(after reboot... this is what shows most of the day)


May 10 02:35:52 mackey Server Administrator: Storage Service EventID: 2095  
SCSI sense data Sense key:  B Sense code: 4B Sense quali
fier:  4:  Physical Disk 1:0:13 Controller 0, Connector 1
May 10 02:35:54 mackey Server Administrator: Storage Service EventID: 2095  
SCSI sense data Sense key:  B Sense code: 4B Sense quali
fier:  4:  Physical Disk 1:0:2 Controller 0, Connector 1
May 10 02:35:54 mackey Server Administrator: Storage Service EventID: 2095  
SCSI sense data Sense key:  B Sense code: 4B Sense quali
fier:  4:  Physical Disk 1:0:1 Controller 0, Connector 1




Any ideas what could be causing this?   This is all internal disk, nothing 
external.   CentOS 5.4 kernel 2.6.18-128.1.10.el5 #1 SMP Thu May 7 10:35:59 EDT 
2009 x86_64 x86_64 x86_64 GNU/Linux


Thanks,
--Chris


Christopher M. Trainor
Manager, IT & Network Operations
Quick Hit, Inc.
o.  508.203.4857
w.  www.quickhit.com



_______________________________________________
Linux-PowerEdge mailing list
[email protected]
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq

Reply via email to